* New optimisations for long raw strings in C++ Mode. @ 2022-08-06 21:29 Alan Mackenzie 2022-08-07 12:49 ` Lars Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-06 21:29 UTC (permalink / raw) To: emacs-devel, Lars Ingebrigtsen Hello, Lars, hello, Emacs. I've just committed some optimisations to the savannah master branch which should speed up the handling of long C++ raw strings in an Emacs where font-locking is fully working. On a megabyte long raw string in a normal build on a reasonably modern machine, the response is slightly sluggish, but not unreasonable to somebody who prefers correct fontification to instant response. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-06 21:29 New optimisations for long raw strings in C++ Mode Alan Mackenzie @ 2022-08-07 12:49 ` Lars Ingebrigtsen 2022-08-07 13:25 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-07 12:49 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > I've just committed some optimisations to the savannah master branch > which should speed up the handling of long C++ raw strings in an Emacs > where font-locking is fully working. > > On a megabyte long raw string in a normal build on a reasonably modern > machine, the response is slightly sluggish, but not unreasonable to > somebody who prefers correct fontification to instant response. I tried the same test case as before, yanking a 1MB long line into a raw string, and Emacs is still hanging after 20s (at which point I'm killing it). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 12:49 ` Lars Ingebrigtsen @ 2022-08-07 13:25 ` Alan Mackenzie 2022-08-07 13:34 ` Lars Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-07 13:25 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel Hello, Lars. On Sun, Aug 07, 2022 at 14:49:40 +0200, Lars Ingebrigtsen wrote: > Alan Mackenzie <acm@muc.de> writes: > > I've just committed some optimisations to the savannah master branch > > which should speed up the handling of long C++ raw strings in an Emacs > > where font-locking is fully working. > > On a megabyte long raw string in a normal build on a reasonably modern > > machine, the response is slightly sluggish, but not unreasonable to > > somebody who prefers correct fontification to instant response. > I tried the same test case as before, yanking a 1MB long line into a raw > string, and Emacs is still hanging after 20s (at which point I'm killing > it). Hmm. Something is different on my setup from yours. After my optimisations, I do: (i) Emacs -Q (ii) M-: (setq long-line-threshold nil) (ii) C-x C-f ~/long-line.cc RET; This is a file containing a 1MB raw string. (iii) This loads and displays in somewhere between 1 and 2 seconds. (iv) C-x 5 b long-line2.cc RET M-x c++-mode RET (v) Type in char long_line [] = R"foo( (vi) Type in RET, twice (vii) Type in )foo"; (ix) C-x 5 o (x) Get the long line into the kill ring, with movement commands and M-w. (xi) C-x 5 o. (xii) Put point on the blank line 2. (xiii) C-y. (xiv) This takes less than a second to display. What does your 1MB string look like? Does it contain lots of "s? CC Mode needs to search through the yanked line for the first occurrence of )foo", which is why it is not instantaneous. It also has to put a syntax-table text property on each " which isn't the terminating )foo", which is another reason it's not instantaneous. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 13:25 ` Alan Mackenzie @ 2022-08-07 13:34 ` Lars Ingebrigtsen 2022-08-07 14:40 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-07 13:34 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > Hmm. Something is different on my setup from yours. After my > optimisations, I do: Is this on the current trunk? > (i) Emacs -Q > (ii) M-: (setq long-line-threshold nil) > (ii) C-x C-f ~/long-line.cc RET; This is a file containing a 1MB raw > string. > (iii) This loads and displays in somewhere between 1 and 2 seconds. > (iv) C-x 5 b long-line2.cc RET M-x c++-mode RET > (v) Type in char long_line [] = R"foo( > (vi) Type in RET, twice > (vii) Type in )foo"; Yup. > (ix) C-x 5 o > (x) Get the long line into the kill ring, with movement commands and > M-w. > (xi) C-x 5 o. > (xii) Put point on the blank line 2. > (xiii) C-y. > (xiv) This takes less than a second to display. It's still hanging for me after 20s. > What does your 1MB string look like? Does it contain lots of "s? CC > Mode needs to search through the yanked line for the first occurrence of > )foo", which is why it is not instantaneous. It also has to put a > syntax-table text property on each " which isn't the terminating )foo", > which is another reason it's not instantaneous. The string I'm yanking is just 1MB worth of x and y characters (and no spaces or anything). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 13:34 ` Lars Ingebrigtsen @ 2022-08-07 14:40 ` Alan Mackenzie 2022-08-07 14:41 ` Lars Ingebrigtsen ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Alan Mackenzie @ 2022-08-07 14:40 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel Hello, Lars. On Sun, Aug 07, 2022 at 15:34:45 +0200, Lars Ingebrigtsen wrote: > Alan Mackenzie <acm@muc.de> writes: > > Hmm. Something is different on my setup from yours. After my > > optimisations, I do: > Is this on the current trunk? Yes, after the committing of commit a332034160bf8e1f38039cd2d37898de6f94508f Author: Alan Mackenzie <acm@muc.de> Date: Sun Aug 7 12:26:16 2022 +0000 CC Mode: Fix looping in patch from yesterday > > (i) Emacs -Q > > (ii) M-: (setq long-line-threshold nil) > > (ii) C-x C-f ~/long-line.cc RET; This is a file containing a 1MB raw > > string. > > (iii) This loads and displays in somewhere between 1 and 2 seconds. > > (iv) C-x 5 b long-line2.cc RET M-x c++-mode RET > > (v) Type in char long_line [] = R"foo( > > (vi) Type in RET, twice > > (vii) Type in )foo"; > Yup. > > (ix) C-x 5 o > > (x) Get the long line into the kill ring, with movement commands and > > M-w. > > (xi) C-x 5 o. > > (xii) Put point on the blank line 2. > > (xiii) C-y. > > (xiv) This takes less than a second to display. > It's still hanging for me after 20s. > > What does your 1MB string look like? Does it contain lots of "s? CC > > Mode needs to search through the yanked line for the first occurrence of > > )foo", which is why it is not instantaneous. It also has to put a > > syntax-table text property on each " which isn't the terminating )foo", > > which is another reason it's not instantaneous. > The string I'm yanking is just 1MB worth of x and y characters (and no > spaces or anything). Mine looked like "012345678 012345678 012345678 .... 012345678 ". I tried replacing each space by a 9, but the string still yanks with the expected minimal sluggishness. Actually, I just tried it again after M-: (setq long-line-threshold 10000). It hangs. Did you omit step (ii) above, by any chance? -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 14:40 ` Alan Mackenzie @ 2022-08-07 14:41 ` Lars Ingebrigtsen 2022-08-07 14:54 ` Lars Ingebrigtsen 2022-08-07 15:00 ` Eli Zaretskii 2 siblings, 0 replies; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-07 14:41 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > Actually, I just tried it again after M-: (setq long-line-threshold > 10000). It hangs. Did you omit step (ii) above, by any chance? I tried both, and it hangs in both cases. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 14:40 ` Alan Mackenzie 2022-08-07 14:41 ` Lars Ingebrigtsen @ 2022-08-07 14:54 ` Lars Ingebrigtsen 2022-08-07 16:13 ` Alan Mackenzie 2022-08-07 15:00 ` Eli Zaretskii 2 siblings, 1 reply; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-07 14:54 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Simplest case to reproduce: M-: (setq long-line-threshold nil) Go to a cc buffer containing: char long_line[] = R"foo( )foo" M-: (insert (make-string 1000000 ?y)) on the second line. This reliably hangs Emacs for me. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 14:54 ` Lars Ingebrigtsen @ 2022-08-07 16:13 ` Alan Mackenzie 2022-08-07 16:17 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-07 16:13 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel Hello, Lars. On Sun, Aug 07, 2022 at 16:54:37 +0200, Lars Ingebrigtsen wrote: > Simplest case to reproduce: > M-: (setq long-line-threshold nil) > Go to a cc buffer containing: > char long_line[] = R"foo( > )foo" > M-: (insert (make-string 1000000 ?y)) > on the second line. This reliably hangs Emacs for me. Actually, when I try your recipe with make-string, it hangs for me, too, for quite a few minutes. Alternatively, when I do C-y, the insertion takes a little less than a second. In the (insert <string>) recipe, when it finally finishes, I see the following in *Messages*: Error during redisplay: (jit-lock-function 27) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 1527) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 3027) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 4527) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 6027) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 7527) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 1000027) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 9027) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 10527) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 12027) signaled (error "Stack overflow in regexp matcher") Error during redisplay: (jit-lock-function 13527) signaled (error "Stack overflow in regexp matcher") .... and so on. So it's a bug needing fixing, rather than systematic slowness. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 16:13 ` Alan Mackenzie @ 2022-08-07 16:17 ` Eli Zaretskii 2022-08-09 11:00 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-07 16:17 UTC (permalink / raw) To: Alan Mackenzie; +Cc: larsi, emacs-devel > Date: Sun, 7 Aug 2022 16:13:56 +0000 > Cc: emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > Simplest case to reproduce: > > > M-: (setq long-line-threshold nil) > > > Go to a cc buffer containing: > > > char long_line[] = R"foo( > > > )foo" > > > M-: (insert (make-string 1000000 ?y)) > > > on the second line. This reliably hangs Emacs for me. > > Actually, when I try your recipe with make-string, it hangs for me, too, > for quite a few minutes. Alternatively, when I do C-y, the insertion > takes a little less than a second. I actually tried the "C-y" recipe, and it hangs for me that way, too. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 16:17 ` Eli Zaretskii @ 2022-08-09 11:00 ` Alan Mackenzie 2022-08-09 15:35 ` Lars Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-09 11:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, emacs-devel Hello, Eli and Lars. On Sun, Aug 07, 2022 at 19:17:50 +0300, Eli Zaretskii wrote: > > Date: Sun, 7 Aug 2022 16:13:56 +0000 > > Cc: emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > > Simplest case to reproduce: > > > M-: (setq long-line-threshold nil) > > > Go to a cc buffer containing: > > > char long_line[] = R"foo( > > > )foo" > > > M-: (insert (make-string 1000000 ?y)) > > > on the second line. This reliably hangs Emacs for me. > > Actually, when I try your recipe with make-string, it hangs for me, too, > > for quite a few minutes. Alternatively, when I do C-y, the insertion > > takes a little less than a second. > I actually tried the "C-y" recipe, and it hangs for me that way, too. The problem was not so much the long line itself, but that there were too many contiguous letters in it. This caused (repeated) overflow of the regexp engine stack in jit-lock. Just as a matter of interest, my functions (not yet committed) for dumping backtraces after a redisplay error were very useful in identifying the problematic regexp. I fixed this regexp by replacing a "*" by a "\\{,1000\\}", in the hope that nobody will want an identifier longer than 1000 characters. I've committed the fix. Now the sequence (i) emacs -Q (ii) M-: (setq long-line-threshold nil) (iii) Insert opening and closing raw string delimiters into a C++ Mode buffer. (iv) Put point between the delimiters. (v) M-: (insert (make-string 1000000 ?y)) works in a reasonable amount of time (the last step takes under a second on my system), especially considering that the code scans the inserted string for a closing raw string delimiter. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 11:00 ` Alan Mackenzie @ 2022-08-09 15:35 ` Lars Ingebrigtsen 2022-08-09 15:38 ` Lars Ingebrigtsen 0 siblings, 1 reply; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-09 15:35 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel Alan Mackenzie <acm@muc.de> writes: > I've committed the fix. Now the sequence > (i) emacs -Q > (ii) M-: (setq long-line-threshold nil) > (iii) Insert opening and closing raw string delimiters into a C++ Mode > buffer. > (iv) Put point between the delimiters. > (v) M-: (insert (make-string 1000000 ?y)) > > works in a reasonable amount of time (the last step takes under a second > on my system), especially considering that the code scans the inserted > string for a closing raw string delimiter. I can confirm that this is fast now; thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 15:35 ` Lars Ingebrigtsen @ 2022-08-09 15:38 ` Lars Ingebrigtsen 2022-08-09 16:05 ` Alan Mackenzie 2022-08-10 13:25 ` Eli Zaretskii 0 siblings, 2 replies; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-09 15:38 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, emacs-devel Lars Ingebrigtsen <larsi@gnus.org> writes: > I can confirm that this is fast now; thanks. But opening the resulting file with "emacs -Q" and then doing a `M->' now hangs Emacs, which it didn't use to do, I think? (But `C-g' allows breaking and Emacs is responsive afterwards, so it's not that serious.) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 15:38 ` Lars Ingebrigtsen @ 2022-08-09 16:05 ` Alan Mackenzie 2022-08-09 16:34 ` Eli Zaretskii 2022-08-10 13:25 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-09 16:05 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel Hello, Lars. On Tue, Aug 09, 2022 at 17:38:14 +0200, Lars Ingebrigtsen wrote: > Lars Ingebrigtsen <larsi@gnus.org> writes: > > I can confirm that this is fast now; thanks. :-) > But opening the resulting file with "emacs -Q" and then doing a `M->' > now hangs Emacs, which it didn't use to do, I think? If you do (setq long-line-threshold nil), M-> is fast. Only when you omit it does Emacs hang. Hmm. It's meant to be the other way around, isn't it? ;-) > (But `C-g' allows breaking and Emacs is responsive afterwards, so it's > not that serious.) What isn't so clever is that with a long line of y's, C-p, C-n, M-v, C-v are slower than inserting the long line. The profiler on M-v indicated that scroll-down-command was taking 86% of the time. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 16:05 ` Alan Mackenzie @ 2022-08-09 16:34 ` Eli Zaretskii 2022-08-09 20:39 ` Gregory Heytings 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-09 16:34 UTC (permalink / raw) To: Alan Mackenzie; +Cc: larsi, emacs-devel > Date: Tue, 9 Aug 2022 16:05:40 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > But opening the resulting file with "emacs -Q" and then doing a `M->' > > now hangs Emacs, which it didn't use to do, I think? > > If you do (setq long-line-threshold nil), M-> is fast. Only when you > omit it does Emacs hang. Hmm. It's meant to be the other way around, > isn't it? ;-) Yes. I'm guessing there's some additional bug there. > What isn't so clever is that with a long line of y's, C-p, C-n, M-v, C-v > are slower than inserting the long line. The profiler on M-v indicated > that scroll-down-command was taking 86% of the time. If this is with long-line-threshold set to nil, then you have just re-discovered the problematic performance of Emacs with very long line, which long-line-threshold and the resulting selective narrowing are designed to fix. Btw, C-n/C-p don't call scroll-down-command, so I guess that was only shown in the profile for M-v. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 16:34 ` Eli Zaretskii @ 2022-08-09 20:39 ` Gregory Heytings 2022-08-09 21:43 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Gregory Heytings @ 2022-08-09 20:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Alan Mackenzie, larsi, emacs-devel >>> But opening the resulting file with "emacs -Q" and then doing a `M->' >>> now hangs Emacs, which it didn't use to do, I think? >> >> If you do (setq long-line-threshold nil), M-> is fast. Only when you >> omit it does Emacs hang. Hmm. It's meant to be the other way around, >> isn't it? ;-) > > Yes. I'm guessing there's some additional bug there. > As has been discussed, CC Mode is, sadly, by design, incompatible with the new feature (and I wonder what the ";-)" above is supposed to convey). It insists on accessing the whole buffer, and doesn't downgrade gracefully when it can't. In this case, with emacs -Q, after M->, (jit-lock-fontify-now 999600 1001100) is called, which calls (c-font-lock-fontify-region 999600 1000034), which does (widen), and calls (font-lock-default-fontify-region 991200 1000034) because these (991200 and 1000034) are the bounds of the locked narrowing. For some reason (which I don't have the patience to track down), because that (widen), which shouldn't be there in the first place, doesn't do what the function expects it to do, font-lock-default-fontify-region never ends. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 20:39 ` Gregory Heytings @ 2022-08-09 21:43 ` Alan Mackenzie 2022-08-09 23:05 ` Stefan Monnier ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Alan Mackenzie @ 2022-08-09 21:43 UTC (permalink / raw) To: Gregory Heytings; +Cc: Eli Zaretskii, larsi, emacs-devel Hello, Gregory. On Tue, Aug 09, 2022 at 20:39:51 +0000, Gregory Heytings wrote: > >>> But opening the resulting file with "emacs -Q" and then doing a `M->' > >>> now hangs Emacs, which it didn't use to do, I think? > >> If you do (setq long-line-threshold nil), M-> is fast. Only when you > >> omit it does Emacs hang. Hmm. It's meant to be the other way around, > >> isn't it? ;-) > > Yes. I'm guessing there's some additional bug there. > As has been discussed, CC Mode is, sadly, by design, incompatible with the > new feature ..... Er, actually, CC Mode has been around a tad longer than the new feature. It would be more accurate to state that the new feature was, by design, incompatible with existing software. The new feature, by design, breaks long-standing contracts in Emacs, namely that `widen', etc., work. Of course, testing could have shown up this incompatibility at an early stage, perhaps even leading to a solution. A pity we didn't have more thorough testing. So, what do you intend to do about this incompatibility you have introduced? Anything? > .... (and I wonder what the ";-)" above is supposed to convey). The irony of a supposed optimisation causing software to hang. > It insists on accessing the whole buffer, .... There's no need to anthropomorphise. A major mode accesses its buffer. What will we have next! > .... and doesn't downgrade gracefully when it can't. Yeah, it depends on defined functionality working. If only its designers had been clever enough 20 years ago to foresee that parts of Emacs would stop working as documented ..... > In this case, with emacs -Q, after M->, (jit-lock-fontify-now 999600 > 1001100) is called, which calls (c-font-lock-fontify-region 999600 > 1000034), which does (widen), and calls > (font-lock-default-fontify-region 991200 1000034) because these > (991200 and 1000034) are the bounds of the locked narrowing. For some > reason (which I don't have the patience to track down), because that > (widen), which shouldn't be there in the first place,.... Yeah, it would be convenient for you if everybody followed your (controversial) desires, rather than what's advertised in the Elisp manual. However, you knew before constructing your new feature that major modes use widen, and went ahead and broke it anyway. Still, I suppose having rapid processing of monster buffers is more important than longstanding software continuing to work, so that's all right, then. > .... doesn't do what the function expects it to do, > font-lock-default-fontify-region never ends. What do you intend to do about this? Anything? -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 21:43 ` Alan Mackenzie @ 2022-08-09 23:05 ` Stefan Monnier 2022-08-10 2:43 ` Eli Zaretskii 2022-08-10 7:42 ` Gregory Heytings 2022-08-10 13:28 ` Eli Zaretskii 2 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2022-08-09 23:05 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Gregory Heytings, Eli Zaretskii, larsi, emacs-devel > Er, actually, CC Mode has been around a tad longer than the new feature. > > It would be more accurate to state that the new feature was, by design, > incompatible with existing software. The new feature, by design, breaks > long-standing contracts in Emacs, namely that `widen', etc., work. > > Of course, testing could have shown up this incompatibility at an early > stage, perhaps even leading to a solution. A pity we didn't have more > thorough testing. The long-lines code is not particularly concerned with CC-mode, tho. In my experience long lines occur *extremely* rarely in files handled by CC-mode, so it's not particularly important if this case is handled well or not. > So, what do you intend to do about this incompatibility you have > introduced? Anything? I think "nothing" is a perfectly valid choice at this point. Of course, fixing CC-mode so that its font-lock (and indentation) code doesn't call `widen` would be good in any case (also for use in MMM-mode). I understand it can be somewhat annoying since the code that calls `widen` might also be called from other places than indentation and font-lock. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 23:05 ` Stefan Monnier @ 2022-08-10 2:43 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 2:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: acm, gregory, larsi, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Gregory Heytings <gregory@heytings.org>, Eli Zaretskii <eliz@gnu.org>, > larsi@gnus.org, emacs-devel@gnu.org > Date: Tue, 09 Aug 2022 19:05:58 -0400 > > > So, what do you intend to do about this incompatibility you have > > introduced? Anything? > > I think "nothing" is a perfectly valid choice at this point. Agreed. > Of course, fixing CC-mode so that its font-lock (and indentation) code > doesn't call `widen` would be good in any case (also for use in MMM-mode). Agreed again, but given Alan's opinions, this is unlikely to happen. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 21:43 ` Alan Mackenzie 2022-08-09 23:05 ` Stefan Monnier @ 2022-08-10 7:42 ` Gregory Heytings 2022-08-10 13:28 ` Eli Zaretskii 2 siblings, 0 replies; 45+ messages in thread From: Gregory Heytings @ 2022-08-10 7:42 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, larsi, emacs-devel > > So, what do you intend to do about this incompatibility you have > introduced? > As I already said a few days ago, this is not my problem. There is no possible solution for modes which don't use the Emacs interfaces as they are supposed to be used, except disabling the new feature, which is 100% backward compatible, in these modes. In the meantime two other people (Stefan and Eli) explained, with different words and viewpoints, why CC Mode should not do what it does. You made it very clear that you don't care. End of story. >> .... (and I wonder what the ";-)" above is supposed to convey). > > The irony of a supposed optimisation causing software to hang. > The long line optimizations does not claim to be a solution to all possible hangs. Doing that is theoretically impossible, as you may know. They only claim to be a practical solution to alleviate a particular class of hangs. > > it would be convenient for you if everybody followed your > (controversial) desires, > I have no desires whatsoever. I worked (and still work) to fix a long-standing bug in Emacs, by which I was, incidentally, not affected myself. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 21:43 ` Alan Mackenzie 2022-08-09 23:05 ` Stefan Monnier 2022-08-10 7:42 ` Gregory Heytings @ 2022-08-10 13:28 ` Eli Zaretskii 2022-08-10 16:23 ` Alan Mackenzie 2 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 13:28 UTC (permalink / raw) To: Alan Mackenzie; +Cc: gregory, larsi, emacs-devel > Date: Tue, 9 Aug 2022 21:43:28 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > As has been discussed, CC Mode is, sadly, by design, incompatible with the > > new feature ..... > > Er, actually, CC Mode has been around a tad longer than the new feature. That just means it had been doing what it shouldn't for a very long time. It doesn't get any good points for that. > It would be more accurate to state that the new feature was, by design, > incompatible with existing software. The new feature, by design, breaks > long-standing contracts in Emacs, namely that `widen', etc., work. > > Of course, testing could have shown up this incompatibility at an early > stage, perhaps even leading to a solution. A pity we didn't have more > thorough testing. > > So, what do you intend to do about this incompatibility you have > introduced? Anything? Actually, I'd expect you, as the maintainer of CC Mode, to look into this and try to fix whatever needs fixing. (But for now I cannot reproduce the problem, so maybe there's nothing to fix.) > What do you intend to do about this? Anything? How about you? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 13:28 ` Eli Zaretskii @ 2022-08-10 16:23 ` Alan Mackenzie 2022-08-10 16:35 ` Eli Zaretskii 2022-08-10 17:19 ` Gregory Heytings 0 siblings, 2 replies; 45+ messages in thread From: Alan Mackenzie @ 2022-08-10 16:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gregory, larsi, emacs-devel Hello, Eli. On Wed, Aug 10, 2022 at 16:28:09 +0300, Eli Zaretskii wrote: > > Date: Tue, 9 Aug 2022 21:43:28 +0000 > > Cc: Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > > As has been discussed, CC Mode is, sadly, by design, incompatible with the > > > new feature ..... > > Er, actually, CC Mode has been around a tad longer than the new feature. > That just means it had been doing what it shouldn't for a very long > time. It doesn't get any good points for that. CC Mode has not been doing anything wrong in accessing the buffers it controls. The idea that one should access only the characters in the (BEG END) supplied by fontification_functions (and jit-lock) is false. It has no basis in rationality. And in fact, standard font-locking itself accesses (via syntax-ppss) all character positions from BOB to BEG. Just as a matter of interest, to whom it may concern, note that syntax-ppss behaves differently in narrowed buffers and widened buffers. In particular, it uses two distinct caches for these two cases, and erases the "narrowed" cache when point-min changes. This may have relevance for font-locking when widen isn't working. > > It would be more accurate to state that the new feature was, by design, > > incompatible with existing software. The new feature, by design, breaks > > long-standing contracts in Emacs, namely that `widen', etc., work. > > Of course, testing could have shown up this incompatibility at an early > > stage, perhaps even leading to a solution. A pity we didn't have more > > thorough testing. > > So, what do you intend to do about this incompatibility you have > > introduced? Anything? > Actually, I'd expect you, as the maintainer of CC Mode, to look into > this and try to fix whatever needs fixing. (But for now I cannot > reproduce the problem, so maybe there's nothing to fix.) I would have expected the implementor of a new feature to do his utmost not to break existing software, and should this unfortunately transpire, to work with others to fix it. In this case, the implementor, Gregory, seems overjoyed to have broken CC Mode, and appears to reject any responsibility for the breakage. > > What do you intend to do about this? Anything? > How about you? Somebody will have to clean up the mess, yes, and that task, with virtual certainty, will fall to me, even though I'd far rather be doing more productive things. Given how narrowing/widening is essential to all aspects of CC Mode, in particular to c-parse-state, and that c-parse-state is used in CC Mode's font-locking, it seems the only way forward is to disable font-locking entirely when narrowing/widening aren't working. Either that, or a complete redesign from scratch. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 16:23 ` Alan Mackenzie @ 2022-08-10 16:35 ` Eli Zaretskii 2022-08-10 16:50 ` Alan Mackenzie 2022-08-10 17:19 ` Gregory Heytings 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 16:35 UTC (permalink / raw) To: Alan Mackenzie; +Cc: gregory, larsi, emacs-devel > Date: Wed, 10 Aug 2022 16:23:27 +0000 > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > > Er, actually, CC Mode has been around a tad longer than the new feature. > > > That just means it had been doing what it shouldn't for a very long > > time. It doesn't get any good points for that. > > CC Mode has not been doing anything wrong in accessing the buffers it > controls. The idea that one should access only the characters in the > (BEG END) supplied by fontification_functions (and jit-lock) is false. > It has no basis in rationality. And in fact, standard font-locking > itself accesses (via syntax-ppss) all character positions from BOB to > BEG. You seem to disagree with a major idea of the design of the Emacs display engine. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 16:35 ` Eli Zaretskii @ 2022-08-10 16:50 ` Alan Mackenzie 2022-08-10 16:58 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-10 16:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gregory, larsi, emacs-devel Hello, Eli. On Wed, Aug 10, 2022 at 19:35:58 +0300, Eli Zaretskii wrote: > > Date: Wed, 10 Aug 2022 16:23:27 +0000 > > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > > > Er, actually, CC Mode has been around a tad longer than the new feature. > > > That just means it had been doing what it shouldn't for a very long > > > time. It doesn't get any good points for that. > > CC Mode has not been doing anything wrong in accessing the buffers it > > controls. The idea that one should access only the characters in the > > (BEG END) supplied by fontification_functions (and jit-lock) is false. > > It has no basis in rationality. And in fact, standard font-locking > > itself accesses (via syntax-ppss) all character positions from BOB to > > BEG. > You seem to disagree with a major idea of the design of the Emacs > display engine. I don't think I do. I think you mean the idea of lazy fontification, though you haven't been specific. This fontification is all about fontifying restricted areas of the buffer. There is no principle that one shouldn't look at distant portions of the buffer as need be, to facilitate the fontification of the restricted area. This is absolutely necessary correctly to fontify (long) strings and comments, for example. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 16:50 ` Alan Mackenzie @ 2022-08-10 16:58 ` Eli Zaretskii 2022-08-10 17:32 ` Alan Mackenzie 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 16:58 UTC (permalink / raw) To: Alan Mackenzie; +Cc: gregory, larsi, emacs-devel > Date: Wed, 10 Aug 2022 16:50:43 +0000 > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > Hello, Eli. > > On Wed, Aug 10, 2022 at 19:35:58 +0300, Eli Zaretskii wrote: > > > Date: Wed, 10 Aug 2022 16:23:27 +0000 > > > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > > From: Alan Mackenzie <acm@muc.de> > > > > CC Mode has not been doing anything wrong in accessing the buffers it > > > controls. The idea that one should access only the characters in the > > > (BEG END) supplied by fontification_functions (and jit-lock) is false. > > > It has no basis in rationality. And in fact, standard font-locking > > > itself accesses (via syntax-ppss) all character positions from BOB to > > > BEG. > > > You seem to disagree with a major idea of the design of the Emacs > > display engine. > > I don't think I do. I think you mean the idea of lazy fontification, > though you haven't been specific. No, I mean the idea that redisplay processes only a small amount of buffer text around the window. > This fontification is all about fontifying restricted areas of the > buffer. There is no principle that one shouldn't look at distant > portions of the buffer as need be, to facilitate the fontification of > the restricted area. You are contradicting yourself. > This is absolutely necessary correctly to fontify (long) strings and > comments, for example. Only if you assume the most simplistic processing. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 16:58 ` Eli Zaretskii @ 2022-08-10 17:32 ` Alan Mackenzie 2022-08-10 17:41 ` Eli Zaretskii 2022-08-11 15:47 ` Yuri Khan 0 siblings, 2 replies; 45+ messages in thread From: Alan Mackenzie @ 2022-08-10 17:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gregory, larsi, emacs-devel Hello, Eli. On Wed, Aug 10, 2022 at 19:58:31 +0300, Eli Zaretskii wrote: > > Date: Wed, 10 Aug 2022 16:50:43 +0000 > > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > On Wed, Aug 10, 2022 at 19:35:58 +0300, Eli Zaretskii wrote: > > > > Date: Wed, 10 Aug 2022 16:23:27 +0000 > > > > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > > > From: Alan Mackenzie <acm@muc.de> > > > > CC Mode has not been doing anything wrong in accessing the buffers it > > > > controls. The idea that one should access only the characters in the > > > > (BEG END) supplied by fontification_functions (and jit-lock) is false. > > > > It has no basis in rationality. And in fact, standard font-locking > > > > itself accesses (via syntax-ppss) all character positions from BOB to > > > > BEG. > > > You seem to disagree with a major idea of the design of the Emacs > > > display engine. > > I don't think I do. I think you mean the idea of lazy fontification, > > though you haven't been specific. > No, I mean the idea that redisplay processes only a small amount of > buffer text around the window. I don't think such an idea is coherent, due to the lack of precision of the word "processes". I understand that redisplay _fontifies_ only a small amount of buffer text. However, it can get better results if it is free to _look_ at text anywhere in the buffer. You seem to be conflating "fontifying" with "looking at". I don't think that's helpful. > > This fontification is all about fontifying restricted areas of the > > buffer. There is no principle that one shouldn't look at distant > > portions of the buffer as need be, to facilitate the fontification of > > the restricted area. > You are contradicting yourself. I can't see any contradiction in what I wrote there. > > This is absolutely necessary correctly to fontify (long) strings and > > comments, for example. > Only if you assume the most simplistic processing. If you open a file in its middle (e.g., by desktop), and there's an open block comment there, you've got to look arbitrarily far back to detect that state. In practice parse-partial-sexp from point-min will be used, likely with cacheing of whatever sort. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:32 ` Alan Mackenzie @ 2022-08-10 17:41 ` Eli Zaretskii 2022-08-10 22:31 ` Stefan Monnier ` (2 more replies) 2022-08-11 15:47 ` Yuri Khan 1 sibling, 3 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 17:41 UTC (permalink / raw) To: Alan Mackenzie; +Cc: gregory, larsi, emacs-devel > Date: Wed, 10 Aug 2022 17:32:46 +0000 > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > > > You seem to disagree with a major idea of the design of the Emacs > > > > display engine. > > > > I don't think I do. I think you mean the idea of lazy fontification, > > > though you haven't been specific. > > > No, I mean the idea that redisplay processes only a small amount of > > buffer text around the window. > > I don't think such an idea is coherent, due to the lack of precision of > the word "processes". I understand that redisplay _fontifies_ only a > small amount of buffer text. However, it can get better results if it is > free to _look_ at text anywhere in the buffer. Think about the _idea_ of that: we want to process as little as absolutely necessary for display. It follows that every code we invoke as part of that job should strive to do the same. > You seem to be conflating "fontifying" with "looking at". I don't think > that's helpful. I'm not talking about "looking at", I'm taking about processing. fontification-functions rarely go to far places because they just want to "look", they go there because they want to process text there, possibly process all the text from there to window start. > > > This is absolutely necessary correctly to fontify (long) strings and > > > comments, for example. > > > Only if you assume the most simplistic processing. > > If you open a file in its middle (e.g., by desktop), and there's an open > block comment there, you've got to look arbitrarily far back to detect > that state. Really? Then please tell me how is it that we the humans can detect incorrect fontifications even when shown partial strings and comments? We know that fontifications are incorrect, and where strings or comments start or end immediately, just after a single glance. We never need to go to BOB to find that out. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:41 ` Eli Zaretskii @ 2022-08-10 22:31 ` Stefan Monnier 2022-08-11 6:21 ` Eli Zaretskii 2022-08-11 6:27 ` Immanuel Litzroth 2022-08-11 16:54 ` Alan Mackenzie 2022-08-12 13:05 ` Lynn Winebarger 2 siblings, 2 replies; 45+ messages in thread From: Stefan Monnier @ 2022-08-10 22:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Alan Mackenzie, gregory, larsi, emacs-devel > Really? Then please tell me how is it that we the humans can detect > incorrect fontifications even when shown partial strings and comments? That's usually because we can tell the difference between valid C code and human text and then based on that we can heuristically guess whats comment/strings/code. Our guesses can be wrong, tho. And making such a guess without some AI-style thingy is somewhat difficult. `syntax-begin-function` could use such a thing, but we made it obsolete because making it work well tends to be costly. But that was used for "really nearby" (i.e. find a safe spot near enough that we can avoid scanning 10kB of code, meaning that the heuristic shouldn't take more time than scanning 10kB of code). Maybe to speed up `syntax-ppss` in large buffers, we could re-introduce something like `syntax-begin-function` but where the idea is to try and find a safe spot heuristically within the preceding 1MB or so to avoid scanning the preceding GBs of code: this would give us a higher time-budget for the heuristic, making it possible to work well enough (and it would be used only in buffers >1MB). Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 22:31 ` Stefan Monnier @ 2022-08-11 6:21 ` Eli Zaretskii 2022-08-11 7:37 ` Stefan Monnier 2022-08-11 6:27 ` Immanuel Litzroth 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-11 6:21 UTC (permalink / raw) To: Stefan Monnier; +Cc: acm, gregory, larsi, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Alan Mackenzie <acm@muc.de>, gregory@heytings.org, larsi@gnus.org, > emacs-devel@gnu.org > Date: Wed, 10 Aug 2022 18:31:00 -0400 > > > Really? Then please tell me how is it that we the humans can detect > > incorrect fontifications even when shown partial strings and comments? > > That's usually because we can tell the difference between valid C code > and human text and then based on that we can heuristically guess whats > comment/strings/code. > Our guesses can be wrong, tho. And making such a guess without some > AI-style thingy is somewhat difficult. All true, but the alternative of going to BOB/BOL and coming back is much worse when BOB/BOL is very far away. > `syntax-begin-function` could use such a thing, but we made it obsolete > because making it work well tends to be costly. More costly than going back 20 million characters and then coming back? IOW, what is and isn't costly could change when compared with alternatives. For example, I recently made one of low-level subroutines in the display engine do something "costly" when the buffer has very long lines, and the result turned out to be a very large win in those cases. The "heavy" calculation is only used when the buffer is flagged to have very long lines; we could do something similar for syntax analysis as well. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-11 6:21 ` Eli Zaretskii @ 2022-08-11 7:37 ` Stefan Monnier 0 siblings, 0 replies; 45+ messages in thread From: Stefan Monnier @ 2022-08-11 7:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, gregory, larsi, emacs-devel >> `syntax-begin-function` could use such a thing, but we made it obsolete >> because making it work well tends to be costly. > More costly than going back 20 million characters and then coming back? The design decisions around `syntax-begin-function` were based on experience for "normal" files. IOW, for normal files it proved too costly (either in terms of execution or programming costs, or in terms of extra coding style requirements imposed on the user (e.g. open-paren-in-column-0)). In the message to which you replied I indeed point out that the trade off might very well be different for large files. > IOW, what is and isn't costly could change when compared with > alternatives. For example, I recently made one of low-level > subroutines in the display engine do something "costly" when the > buffer has very long lines, and the result turned out to be a very > large win in those cases. The "heavy" calculation is only used when > the buffer is flagged to have very long lines; we could do something > similar for syntax analysis as well. I think we're in agreement. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 22:31 ` Stefan Monnier 2022-08-11 6:21 ` Eli Zaretskii @ 2022-08-11 6:27 ` Immanuel Litzroth 1 sibling, 0 replies; 45+ messages in thread From: Immanuel Litzroth @ 2022-08-11 6:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: Eli Zaretskii, Alan Mackenzie, gregory, larsi, emacs-devel On Thu, Aug 11, 2022 at 12:32 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > > Really? Then please tell me how is it that we the humans can detect > > incorrect fontifications even when shown partial strings and comments? > > That's usually because we can tell the difference between valid C code > and human text and then based on that we can heuristically guess whats > comment/strings/code. > Our guesses can be wrong, tho. And making such a guess without some > AI-style thingy is somewhat difficult. Hear, hear. Although even AI will not be able to do that well given that any block of C code might be inside a comment that's inside a raw string... Also I've seen some people suggesting that treesitter might be a solution. I did check treesitters c++ parsing earlier this year and the syntactic information it gave was just not detailed enough to do much with. A happy CC mode user, Immanuel -- -- A man must either resolve to point out nothing new or to become a slave to defend it. -- Sir Isaac Newton ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:41 ` Eli Zaretskii 2022-08-10 22:31 ` Stefan Monnier @ 2022-08-11 16:54 ` Alan Mackenzie 2022-08-11 17:15 ` Eli Zaretskii 2022-08-12 13:05 ` Lynn Winebarger 2 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-11 16:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gregory, larsi, emacs-devel Hello, Eli. On Wed, Aug 10, 2022 at 20:41:37 +0300, Eli Zaretskii wrote: > > Date: Wed, 10 Aug 2022 17:32:46 +0000 > > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > From: Alan Mackenzie <acm@muc.de> > > > > > You seem to disagree with a major idea of the design of the Emacs > > > > > display engine. > > > > I don't think I do. I think you mean the idea of lazy fontification, > > > > though you haven't been specific. > > > No, I mean the idea that redisplay processes only a small amount of > > > buffer text around the window. > > I don't think such an idea is coherent, due to the lack of precision of > > the word "processes". I understand that redisplay _fontifies_ only a > > small amount of buffer text. However, it can get better results if it is > > free to _look_ at text anywhere in the buffer. > Think about the _idea_ of that: we want to process as little as > absolutely necessary for display. It follows that every code we > invoke as part of that job should strive to do the same. I suppose the problem here is differing notions of "absolute necessity". > > You seem to be conflating "fontifying" with "looking at". I don't think > > that's helpful. > I'm not talking about "looking at", I'm taking about processing. > fontification-functions rarely go to far places because they just want > to "look", they go there because they want to process text there, > possibly process all the text from there to window start. OK, I think I see what you mean, now. But I still think it's useful to make a distinction between looking at (for example, the 17 ns per character that parse-partial-sexp takes) and something like the fontification of a C declaration by c-font-lock-declarations, which takes much, much longer. > > > > This is absolutely necessary correctly to fontify (long) strings and > > > > comments, for example. > > > Only if you assume the most simplistic processing. > > If you open a file in its middle (e.g., by desktop), and there's an open > > block comment there, you've got to look arbitrarily far back to detect > > that state. > Really? Then please tell me how is it that we the humans can detect > incorrect fontifications even when shown partial strings and comments? I can't really see how that question follows on from the premiss, but human brains are wired to detect patterns at a single glance in a way that computers aren't, at least not yet. > We know that fontifications are incorrect, and where strings or > comments start or end immediately, just after a single glance. We > never need to go to BOB to find that out. Before the days of font-locking in editors, a standard problem was when a comment didn't end where the user thought it did, for lack of a comment ender. There was a particular problem in Pascal (whose precise details aren't that important) where an unclosed comment on the THEN branch of an IF statement would swallow up the ELSE branch completely, leaving no visible trace or syntactic error. It's worth while being careful about strings and comments. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-11 16:54 ` Alan Mackenzie @ 2022-08-11 17:15 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-11 17:15 UTC (permalink / raw) To: Alan Mackenzie; +Cc: gregory, larsi, emacs-devel > Date: Thu, 11 Aug 2022 16:54:47 +0000 > Cc: gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > We know that fontifications are incorrect, and where strings or > > comments start or end immediately, just after a single glance. We > > never need to go to BOB to find that out. > > Before the days of font-locking in editors, a standard problem was when a > comment didn't end where the user thought it did, for lack of a comment > ender. There was a particular problem in Pascal (whose precise details > aren't that important) where an unclosed comment on the THEN branch of an > IF statement would swallow up the ELSE branch completely, leaving no > visible trace or syntactic error. > > It's worth while being careful about strings and comments. As already mentioned up-thread, guessing without going far away could indeed guess wrong, but it is much cheaper and the probability of an error could be made small enough to be acceptable in cases where "being careful" means one has to wait for many seconds for a response for a simple editing command. The guesswork could be activated only when these situations arise, leaving the more accurate fontifications do their job in all the other cases. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:41 ` Eli Zaretskii 2022-08-10 22:31 ` Stefan Monnier 2022-08-11 16:54 ` Alan Mackenzie @ 2022-08-12 13:05 ` Lynn Winebarger 2022-08-12 13:18 ` Eli Zaretskii 2 siblings, 1 reply; 45+ messages in thread From: Lynn Winebarger @ 2022-08-12 13:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Alan Mackenzie, gregory, Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3449 bytes --] On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii <eliz@gnu.org> wrote: > > Really? Then please tell me how is it that we the humans can detect > incorrect fontifications even when shown partial strings and comments? > We know that fontifications are incorrect, and where strings or > comments start or end immediately, just after a single glance. We > never need to go to BOB to find that out. Serious question: is fontification intended to display text according to what the author probably intended, or according to how a compiler will process that text (leaving correctness to a more precise tool than font-lock, whether Semantic, tree-sitter, LSP, whatever)? Because I can definitely write code that has some subtle issue that I will miss, and erroneously think should display one way but which would be processed in a different way. Should fontification show my likely intention (plus, and only for bonus points, possibly highlight the error that disconnects the likely intended from the actual parse), or should it display according to the way the tools will interpret it so the author will find errors that way? When I use a dedicated IDE of recent vintage, it feels less like I am writing a stream of characters than filling in partially constructed objects representing the abstract syntax of the language I'm writing in (with grammar that has allowances for incomplete or erroneous constructs), with the text being displayed as a representation of the underlying object. IOW, the relationship of the syntactic object and the text is inverted compared to emacs's design, where (if I understand correctly) the properties of the syntactic object are only tied to the text through text properties. With the other approach, the fontification and the syntax object are tied together, but with emacs the relationship seems much more tenuous. E.g. completion and fontification are completely separate activities as far as I know, though the same contextual information should be useful for both activities. I have this CC-mode derived mode for a DSL I did not design. I'm currently the sole user of the mode, so I just wanted something quick and dirty. But as the pile of code I deal with in this DSL grows, I want to put in Semantic support for it to get context-aware completion, precise fontification, etc. The current discussion has made me wonder if deriving from CC mode is having some non-obvious effects on how font-lock works, making it non-local in ways that are not necessary, so the re-entrant nature of the Semantic parsers won't cure some of the slowness. For example, I want to use the font-lock of that mode in the REPL to fontify the statements/expressions I enter at the prompt, but otherwise ignore text. Particularly, at the end and the beginning of the REPL buffer. I don't want to narrow the buffer, just the area fontification applies to. Fontifying hundreds of megabytes of tracing print statements is not just unnecessary, it's bad news for the GC even after the buffer is cleared IME. If CC mode is determining more syntactic information than tree-sitter's incremental parsing provides (per Immanuel Lizroth's comment in this thread), then there is a disconnect somewhere in the scope of expectations for what font-lock is supposed to do. I'm certainly not clear (yet) on how to cleanly separate and then rejoin a proper syntactic analysis with fontification, and if there is "an Emacs way" to do it. Lynn [-- Attachment #2: Type: text/html, Size: 4098 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-12 13:05 ` Lynn Winebarger @ 2022-08-12 13:18 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-12 13:18 UTC (permalink / raw) To: Lynn Winebarger; +Cc: acm, gregory, larsi, emacs-devel > From: Lynn Winebarger <owinebar@gmail.com> > Date: Fri, 12 Aug 2022 09:05:06 -0400 > Cc: Alan Mackenzie <acm@muc.de>, gregory@heytings.org, Lars Ingebrigtsen <larsi@gnus.org>, > emacs-devel <emacs-devel@gnu.org> > > Really? Then please tell me how is it that we the humans can detect > incorrect fontifications even when shown partial strings and comments? > We know that fontifications are incorrect, and where strings or > comments start or end immediately, just after a single glance. We > never need to go to BOB to find that out. > > Serious question: is fontification intended to display text according to what the author probably intended, or > according to how a compiler will process that text (leaving correctness to a more precise tool than font-lock, > whether Semantic, tree-sitter, LSP, whatever)? I fail to see how this question is relevant to the issue at hand, which is what should be the behavior of fontification functions in very large files and files with very long lines. > Because I can definitely write code that has some subtle issue that I will miss, and erroneously think should > display one way but which would be processed in a different way. Should fontification show my likely > intention (plus, and only for bonus points, possibly highlight the error that disconnects the likely intended from > the actual parse), or should it display according to the way the tools will interpret it so the author will find > errors that way? Ideally, the latter. But not at a prise of making moving through and editing the file impractical. > When I use a dedicated IDE of recent vintage, it feels less like I am writing a stream of characters than filling > in partially constructed objects representing the abstract syntax of the language I'm writing in (with grammar > that has allowances for incomplete or erroneous constructs), with the text being displayed as a > representation of the underlying object. IOW, the relationship of the syntactic object and the text is inverted > compared to emacs's design, where (if I understand correctly) the properties of the syntactic object are only > tied to the text through text properties. With the other approach, the fontification and the syntax object are > tied together, but with emacs the relationship seems much more tenuous. E.g. completion and fontification > are completely separate activities as far as I know, though the same contextual information should be useful > for both activities. That is correct, for the current Emacs design. > If CC mode is determining more syntactic information than tree-sitter's incremental parsing provides (per > Immanuel Lizroth's comment in this thread) I don't think this is true, FWIW. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:32 ` Alan Mackenzie 2022-08-10 17:41 ` Eli Zaretskii @ 2022-08-11 15:47 ` Yuri Khan 2022-08-11 16:04 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Yuri Khan @ 2022-08-11 15:47 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, gregory, larsi, emacs-devel On Thu, 11 Aug 2022 at 00:36, Alan Mackenzie <acm@muc.de> wrote: > If you open a file in its middle (e.g., by desktop), and there's an open > block comment there, you've got to look arbitrarily far back to detect > that state. In practice parse-partial-sexp from point-min will be used, > likely with cacheing of whatever sort. Does fontification need to be synchronous? I.e. you open a file in its middle, do you expect it fontified exactly and immediately on first render? Some well-known IDEs[^*] make do with first rendering an inexact approximation of syntax highlighting. Then, as the user starts working with the buffer, they may have a few thousand idle milliseconds in which to improve the approximation. [^*]: e.g. one that starts with “vs” and ends with “code”. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-11 15:47 ` Yuri Khan @ 2022-08-11 16:04 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-11 16:04 UTC (permalink / raw) To: Yuri Khan; +Cc: acm, gregory, larsi, emacs-devel > From: Yuri Khan <yuri.v.khan@gmail.com> > Date: Thu, 11 Aug 2022 22:47:34 +0700 > Cc: Eli Zaretskii <eliz@gnu.org>, gregory@heytings.org, larsi@gnus.org, emacs-devel@gnu.org > > On Thu, 11 Aug 2022 at 00:36, Alan Mackenzie <acm@muc.de> wrote: > > > If you open a file in its middle (e.g., by desktop), and there's an open > > block comment there, you've got to look arbitrarily far back to detect > > that state. In practice parse-partial-sexp from point-min will be used, > > likely with cacheing of whatever sort. > > Does fontification need to be synchronous? I.e. you open a file in its > middle, do you expect it fontified exactly and immediately on first > render? > > Some well-known IDEs[^*] make do with first rendering an inexact > approximation of syntax highlighting. Then, as the user starts working > with the buffer, they may have a few thousand idle milliseconds in > which to improve the approximation. That's what jit-lock-defer-time is about, I believe (although our "inexact approximation" is no fontifications at all). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 16:23 ` Alan Mackenzie 2022-08-10 16:35 ` Eli Zaretskii @ 2022-08-10 17:19 ` Gregory Heytings 2022-08-10 17:21 ` Eli Zaretskii 2022-08-10 19:45 ` Alan Mackenzie 1 sibling, 2 replies; 45+ messages in thread From: Gregory Heytings @ 2022-08-10 17:19 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, larsi, emacs-devel > > In this case, the implementor, Gregory, seems overjoyed to have broken > CC Mode, > I don't know what makes you believe such an absurd thing. Let me set the record straight: No, I'm not at all "overjoyed to have broken CC Mode". ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:19 ` Gregory Heytings @ 2022-08-10 17:21 ` Eli Zaretskii 2022-08-10 19:45 ` Alan Mackenzie 1 sibling, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 17:21 UTC (permalink / raw) To: Gregory Heytings; +Cc: acm, larsi, emacs-devel > Date: Wed, 10 Aug 2022 17:19:11 +0000 > From: Gregory Heytings <gregory@heytings.org> > cc: Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org, emacs-devel@gnu.org > > > In this case, the implementor, Gregory, seems overjoyed to have broken > > CC Mode, > > I don't know what makes you believe such an absurd thing. It is also a very unkind thing to say. Alan, you should apologize. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 17:19 ` Gregory Heytings 2022-08-10 17:21 ` Eli Zaretskii @ 2022-08-10 19:45 ` Alan Mackenzie 2022-08-14 20:15 ` Alan Mackenzie 1 sibling, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-10 19:45 UTC (permalink / raw) To: Gregory Heytings; +Cc: Eli Zaretskii, larsi, emacs-devel On Wed, Aug 10, 2022 at 17:19:11 +0000, Gregory Heytings wrote: > > In this case, the implementor, Gregory, seems overjoyed to have broken > > CC Mode, > I don't know what makes you believe such an absurd thing. Let me set the > record straight: No, I'm not at all "overjoyed to have broken CC Mode". [ Taken to private mail. ] -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 19:45 ` Alan Mackenzie @ 2022-08-14 20:15 ` Alan Mackenzie 2022-08-15 8:00 ` Gregory Heytings 0 siblings, 1 reply; 45+ messages in thread From: Alan Mackenzie @ 2022-08-14 20:15 UTC (permalink / raw) To: Gregory Heytings; +Cc: Eli Zaretskii, larsi, emacs-devel To Emacs. On Wed, Aug 10, 2022 at 19:45:23 +0000, Alan Mackenzie wrote: > On Wed, Aug 10, 2022 at 17:19:11 +0000, Gregory Heytings wrote: > > > In this case, the implementor, Gregory, seems overjoyed to have broken > > > CC Mode, > > I don't know what makes you believe such an absurd thing. Let me set the > > record straight: No, I'm not at all "overjoyed to have broken CC Mode". > [ Taken to private mail. ] Full scale apology to Gregory for the remark, which was unjustified and uncalled for. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-14 20:15 ` Alan Mackenzie @ 2022-08-15 8:00 ` Gregory Heytings 0 siblings, 0 replies; 45+ messages in thread From: Gregory Heytings @ 2022-08-15 8:00 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Eli Zaretskii, larsi, emacs-devel > > Full scale apology to Gregory for the remark, which was unjustified and > uncalled for. > Thanks Alan! This is much appreciated, a private apology was already more than enough for me. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-09 15:38 ` Lars Ingebrigtsen 2022-08-09 16:05 ` Alan Mackenzie @ 2022-08-10 13:25 ` Eli Zaretskii 2022-08-12 12:44 ` Lars Ingebrigtsen 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2022-08-10 13:25 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: acm, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org > Date: Tue, 09 Aug 2022 17:38:14 +0200 > > Lars Ingebrigtsen <larsi@gnus.org> writes: > > > I can confirm that this is fast now; thanks. > > But opening the resulting file with "emacs -Q" and then doing a `M->' > now hangs Emacs, which it didn't use to do, I think? It doesn't hang here, neither with long-line-threshold set to nil nor with its default value. (In the latter case, M-> is much faster.) Lars, can you post the file with which you see the hang? I suspect we are not using the same contents. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-10 13:25 ` Eli Zaretskii @ 2022-08-12 12:44 ` Lars Ingebrigtsen 2022-08-12 12:52 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Lars Ingebrigtsen @ 2022-08-12 12:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: acm, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Lars, can you post the file with which you see the hang? I suspect we > are not using the same contents. It's with this recipe: > > Simplest case to reproduce: > > > M-: (setq long-line-threshold nil) > > > Go to a cc buffer containing: > > > char long_line[] = R"foo( > > > )foo" > > > M-: (insert (make-string 1000000 ?y)) > > > on the second line. And then `M->' -- this will hang Emacs (until you hit `C-g', so it's not a "hard" hang). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-12 12:44 ` Lars Ingebrigtsen @ 2022-08-12 12:52 ` Eli Zaretskii 0 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-12 12:52 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: acm, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: acm@muc.de, emacs-devel@gnu.org > Date: Fri, 12 Aug 2022 14:44:57 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Lars, can you post the file with which you see the hang? I suspect we > > are not using the same contents. > > It's with this recipe: > > > > Simplest case to reproduce: > > > > > M-: (setq long-line-threshold nil) > > > > > Go to a cc buffer containing: > > > > > char long_line[] = R"foo( > > > > > )foo" > > > > > M-: (insert (make-string 1000000 ?y)) > > > > > on the second line. > > And then `M->' -- this will hang Emacs (until you hit `C-g', so it's not > a "hard" hang). It doesn't hang here, even though my Emacs is unoptimized. It takes maybe 10 sec or so. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: New optimisations for long raw strings in C++ Mode. 2022-08-07 14:40 ` Alan Mackenzie 2022-08-07 14:41 ` Lars Ingebrigtsen 2022-08-07 14:54 ` Lars Ingebrigtsen @ 2022-08-07 15:00 ` Eli Zaretskii 2 siblings, 0 replies; 45+ messages in thread From: Eli Zaretskii @ 2022-08-07 15:00 UTC (permalink / raw) To: Alan Mackenzie; +Cc: larsi, emacs-devel > Date: Sun, 7 Aug 2022 14:40:47 +0000 > Cc: emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > It's still hanging for me after 20s. I killed it after 7 minutes (this is an unoptimized build). > Did you omit step (ii) above, by any chance? I didn't. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2022-08-15 8:00 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-06 21:29 New optimisations for long raw strings in C++ Mode Alan Mackenzie 2022-08-07 12:49 ` Lars Ingebrigtsen 2022-08-07 13:25 ` Alan Mackenzie 2022-08-07 13:34 ` Lars Ingebrigtsen 2022-08-07 14:40 ` Alan Mackenzie 2022-08-07 14:41 ` Lars Ingebrigtsen 2022-08-07 14:54 ` Lars Ingebrigtsen 2022-08-07 16:13 ` Alan Mackenzie 2022-08-07 16:17 ` Eli Zaretskii 2022-08-09 11:00 ` Alan Mackenzie 2022-08-09 15:35 ` Lars Ingebrigtsen 2022-08-09 15:38 ` Lars Ingebrigtsen 2022-08-09 16:05 ` Alan Mackenzie 2022-08-09 16:34 ` Eli Zaretskii 2022-08-09 20:39 ` Gregory Heytings 2022-08-09 21:43 ` Alan Mackenzie 2022-08-09 23:05 ` Stefan Monnier 2022-08-10 2:43 ` Eli Zaretskii 2022-08-10 7:42 ` Gregory Heytings 2022-08-10 13:28 ` Eli Zaretskii 2022-08-10 16:23 ` Alan Mackenzie 2022-08-10 16:35 ` Eli Zaretskii 2022-08-10 16:50 ` Alan Mackenzie 2022-08-10 16:58 ` Eli Zaretskii 2022-08-10 17:32 ` Alan Mackenzie 2022-08-10 17:41 ` Eli Zaretskii 2022-08-10 22:31 ` Stefan Monnier 2022-08-11 6:21 ` Eli Zaretskii 2022-08-11 7:37 ` Stefan Monnier 2022-08-11 6:27 ` Immanuel Litzroth 2022-08-11 16:54 ` Alan Mackenzie 2022-08-11 17:15 ` Eli Zaretskii 2022-08-12 13:05 ` Lynn Winebarger 2022-08-12 13:18 ` Eli Zaretskii 2022-08-11 15:47 ` Yuri Khan 2022-08-11 16:04 ` Eli Zaretskii 2022-08-10 17:19 ` Gregory Heytings 2022-08-10 17:21 ` Eli Zaretskii 2022-08-10 19:45 ` Alan Mackenzie 2022-08-14 20:15 ` Alan Mackenzie 2022-08-15 8:00 ` Gregory Heytings 2022-08-10 13:25 ` Eli Zaretskii 2022-08-12 12:44 ` Lars Ingebrigtsen 2022-08-12 12:52 ` Eli Zaretskii 2022-08-07 15:00 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).