> Note that the fact that tree-sitter provides incremental parse is a > strong hint that the answer will be "it's not fast enough". that's a non-sequitur, it can also mean that really huge files can be worked on just as if they were a couple of hundred of lines (after the first parse, that is) El jue., 2 de abr. de 2020 a la(s) 19:56, Stephen Leake ( stephen_leake@stephe-leake.org) escribió: > Eli Zaretskii writes: > > >> From: Stephen Leake > >> Date: Wed, 01 Apr 2020 11:51:40 -0800 > >> > >> Eli Zaretskii writes: > >> > >> > Can you tell in more detail why you need to rely on these hooks? They > >> > shouldn't be necessary, AFAIU. > >> > >> It is an optimization choice. > >> > >> In an unmodified buffer, that is smaller than 100,000 characters > >> (default setting of wisi-partial-parse-threshold), the entire buffer is > >> parsed once; that applies faces to all the Ada identifiers that need > >> faces (standard font-lock regexp handles the reserved words). Then when > >> font-lock fontifies a region, no parsing is needed. > > > > But why do you need that initial full parse in the first place? Is > > parsing parts of the buffer so much harder? > > Because the parser must see a complete top level grammar statement. In > Ada, that's the whole file; a typical file looks like: > > package Nifty is > > type Foo is ...; > > function Function_1 is ...; > > end Nifty; > > The parser needs to see all of the "package" declaration. Java and C++ > header files are similar; a single class or namespace. In C++ and C body > files, there are lots of small declarations, and you could parse each > one of those independently, but _only_ if Emacs can find the start and > end of each, which is hard. > > In addition, to properly compute indent, you need the fully nested > context. Computing faces usually doesn't need that, but it might in some > cases. > > >> Indent is similar; the parse sets text properties holding the indent for > >> each line; indent-region then applies them. > > > > Indent is a different use case: it happens by user command, and thus > > has different time restrictions than redisplay. > > Yes, but it is computed by the same parser, so it is relevant. > > >> If the default setting of jit-lock-defer-time (ie nil) is used, then > >> font-lock runs immediately after each change, and the after-change hooks > >> are not needed. But as I have mentioned, I always run with > >> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in > >> some cases), so the change hooks are needed. > > > > AFAIU, tree-sitter and similar parsers are supposed to be much faster, > > so the problem with slow parsing, and all the solutions to alleviate > > that problem, may not be necessary, if they are the only reason for > > using the hooks. > > The main reason the ada-mode parser is too slow is the error correction. > tree-sitter appears to have less sophisticated error correction, which > will give worse results with code under edit. The ada-mode parser can be > speeded up by specifying parameters that cripple the error correction. > > In addition, users will always create huge files (where "huge" means > "bigger than we've seen before"); there are always speed limits. The > reason ada-mode has partial parse is that Eurocontrol has huge files, > that they occasionally edit, and always parsing the whole file, even in > the absence of syntax errors, was too slow. > > >> The alternative to not requiring after-change hooks is to always do a > full > >> parse, for ever call of fontify-region or indent-region. That is far too > >> slow. > > > > Even for indentation, a full parse should not be needed. You need to > > only parse the outermost enclosing function/procedure, right? That's > > rarely the full buffer, except when the buffer is small. > > As discussed above, that depends on your language; in Ada it is _always_ > the full buffer. And finding the start of a function in C and C++ is hard. > > >> Note that Tree-Sitter requires one full parse of the buffer to generate > >> the parse tree that is later updated incrementally; in an unmodified > >> buffer, only that one parse is needed. > > > > Tree-sitter cannot know what the full buffer holds, so nothing > > prevents us from passing it just part of the buffer. After all, > > tree-sitter should be able to do a decent job when the part we pass to > > it actually _is_ all we have in the buffer, right? > > Same issues as above. > > >> > And they cannot pick up every relevant change; for example, what > >> > happens if some face used for font-lock is modified? > >> > >> Yes, that is a flaw. Not likely to occur in everyday use > > > > Redisplay cannot rely on something being "unlikely", because it's > > expected to produce correct results in all situations. > > The flaw is not in ada-mode's use of a parser or after-change-functions; > it's a general problem with font-lock. > > The face values are applied to the buffer text as text properties > containing the symbol that holds the face to be used; for example > (font-lock-face font-lock-function-name-face). If the contents of that > symbol change, then redisplay must be rerun to apply the correct values. > This does _not_ require a reparse; the parser sets the text property, > and that has not changed. > > Use case: A c-mode buffer A is currently displayed in a window in a > frame, it is syntactically correct, and all displayed faces are correct. > In another frame, the user uses 'M-x set-variable' to change the value > of font-lock-function-name-face. > > To update the display, something has to trigger redisplay of buffer A. I > don't think using M-x set-variable in a different frame does that. > > Switching buffers in a frame does cause a redisplay (to update the menu > and mode line); If M-x set-variable is done in the same frame as buffer > A, the change in font-lock-function-name-face should show up as > expected. > > A similar use case would be changing from "light mode" to "dark mode". > That could be done by changing the theme using load-theme; that should > force a redisplay (I assume it does; I have not checked). > > Other than the global face variables, ada-mode does not have any > variables that control faces. Some other modes may, for example setting > the level of highlighting to minimal or max. In that case, the font-lock > regexps change, and the function that does that presumably sets > fontified to nil in the current buffer, and should also force redisplay. > If ada-mode adds a feature like this, there will be a function to change > it (perhaps a custom variable change function) that also forces a > reparse and redisplay. > > > I can understand why fontification methods that are too slow want to > > get some help from hooks, but when we design and implement novel > > fontification methods using fast parsers, we should first try doing > > that without any hooks, > > Yes, premature optimization is evil. Using tree-sitter to implement > font-lock should start by always parsing the whole buffer for every call > of fontify-region. If that is fast enough, we're done. If not, we can > consider whether parsing a smaller part of the buffer is possible. > > Note that the fact that tree-sitter provides incremental parse is a > strong hint that the answer will be "it's not fast enough". > > >> >> By default font-lock runs after every character typed > >> > > >> > No, it only runs when redisplay kicks in. If you type very quickly, > >> > it won't run for every character. At least AFAIR. > >> > >> What triggers redisplay? > > > > When Emacs is about to read input, if no input is available, it > > performs redisplay. IOW, Emacs enters redisplay when it's about to > > become idle. > > > > >> The elisp manual section "Forcing redisplay" says "Emacs normally tries > >> to redisplay the screen whenever it waits for input." After I type the > >> first character, it is no longer waiting for input, it is processing > >> that character. I assume here "process that char code" includes running > >> after-change-functions, which is (small) elisp code. But I guess after > >> processing that char, before calling redisplay, it checks if there is > >> more input, which should be true if I type fast enough. Perhaps "process > >> that char code" is faster than the combination of my fingers and the > >> keyboard char send rate? > > > > Yes, most probably. > > Ok, so in practice, it is not possible to type fast enough, and > font-lock runs after every character typed. > > > In other similar situations (e.g., in Flyspell mode) we wait for some > > non-zero idle time before actually running the code which could react > > to slow typing with annoying messages. > > Since font-lock is running a parser, it detects syntax errors. I > could delay the display of the fringe mark, without delaying font-lock > itself. I'll put that on my list. > > -- > -- Stephe > >