> Note that the fact that tree-sitter provides incremental parse is a
> strong hint that the answer will be "it's not fast enough".

that's a non-sequitur, it can also mean that really huge files can be
worked on just as if they were a couple of hundred of lines (after the
first parse, that is)

El jue., 2 de abr. de 2020 a la(s) 19:56, Stephen Leake (
stephen_leake@stephe-leake.org) escribió:

> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> From: Stephen Leake <stephen_leake@stephe-leake.org>
> >> Date: Wed, 01 Apr 2020 11:51:40 -0800
> >>
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >>
> >> > Can you tell in more detail why you need to rely on these hooks?  They
> >> > shouldn't be necessary, AFAIU.
> >>
> >> It is an optimization choice.
> >>
> >> In an unmodified buffer, that is smaller than 100,000 characters
> >> (default setting of wisi-partial-parse-threshold), the entire buffer is
> >> parsed once; that applies faces to all the Ada identifiers that need
> >> faces (standard font-lock regexp handles the reserved words). Then when
> >> font-lock fontifies a region, no parsing is needed.
> >
> > But why do you need that initial full parse in the first place?  Is
> > parsing parts of the buffer so much harder?
>
> Because the parser must see a complete top level grammar statement. In
> Ada, that's the whole file; a typical file looks like:
>
> package Nifty is
>
>     type Foo is ...;
>
>     function Function_1 is ...;
>
> end Nifty;
>
> The parser needs to see all of the "package" declaration. Java and C++
> header files are similar; a single class or namespace. In C++ and C body
> files, there are lots of small declarations, and you could parse each
> one of those independently, but _only_ if Emacs can find the start and
> end of each, which is hard.
>
> In addition, to properly compute indent, you need the fully nested
> context. Computing faces usually doesn't need that, but it might in some
> cases.
>
> >> Indent is similar; the parse sets text properties holding the indent for
> >> each line; indent-region then applies them.
> >
> > Indent is a different use case: it happens by user command, and thus
> > has different time restrictions than redisplay.
>
> Yes, but it is computed by the same parser, so it is relevant.
>
> >> If the default setting of jit-lock-defer-time (ie nil) is used, then
> >> font-lock runs immediately after each change, and the after-change hooks
> >> are not needed. But as I have mentioned, I always run with
> >> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
> >> some cases), so the change hooks are needed.
> >
> > AFAIU, tree-sitter and similar parsers are supposed to be much faster,
> > so the problem with slow parsing, and all the solutions to alleviate
> > that problem, may not be necessary, if they are the only reason for
> > using the hooks.
>
> The main reason the ada-mode parser is too slow is the error correction.
> tree-sitter appears to have less sophisticated error correction, which
> will give worse results with code under edit. The ada-mode parser can be
> speeded up by specifying parameters that cripple the error correction.
>
> In addition, users will always create huge files (where "huge" means
> "bigger than we've seen before"); there are always speed limits. The
> reason ada-mode has partial parse is that Eurocontrol has huge files,
> that they occasionally edit, and always parsing the whole file, even in
> the absence of syntax errors, was too slow.
>
> >> The alternative to not requiring after-change hooks is to always do a
> full
> >> parse, for ever call of fontify-region or indent-region. That is far too
> >> slow.
> >
> > Even for indentation, a full parse should not be needed.  You need to
> > only parse the outermost enclosing function/procedure, right?  That's
> > rarely the full buffer, except when the buffer is small.
>
> As discussed above, that depends on your language; in Ada it is _always_
> the full buffer. And finding the start of a function in C and C++ is hard.
>
> >> Note that Tree-Sitter requires one full parse of the buffer to generate
> >> the parse tree that is later updated incrementally; in an unmodified
> >> buffer, only that one parse is needed.
> >
> > Tree-sitter cannot know what the full buffer holds, so nothing
> > prevents us from passing it just part of the buffer.  After all,
> > tree-sitter should be able to do a decent job when the part we pass to
> > it actually _is_ all we have in the buffer, right?
>
> Same issues as above.
>
> >> > And they cannot pick up every relevant change; for example, what
> >> > happens if some face used for font-lock is modified?
> >>
> >> Yes, that is a flaw. Not likely to occur in everyday use
> >
> > Redisplay cannot rely on something being "unlikely", because it's
> > expected to produce correct results in all situations.
>
> The flaw is not in ada-mode's use of a parser or after-change-functions;
> it's a general problem with font-lock.
>
> The face values are applied to the buffer text as text properties
> containing the symbol that holds the face to be used; for example
> (font-lock-face font-lock-function-name-face). If the contents of that
> symbol change, then redisplay must be rerun to apply the correct values.
> This does _not_ require a reparse; the parser sets the text property,
> and that has not changed.
>
> Use case: A c-mode buffer A is currently displayed in a window in a
> frame, it is syntactically correct, and all displayed faces are correct.
> In another frame, the user uses 'M-x set-variable' to change the value
> of font-lock-function-name-face.
>
> To update the display, something has to trigger redisplay of buffer A. I
> don't think using M-x set-variable in a different frame does that.
>
> Switching buffers in a frame does cause a redisplay (to update the menu
> and mode line); If M-x set-variable is done in the same frame as buffer
> A, the change in font-lock-function-name-face should show up as
> expected.
>
> A similar use case would be changing from "light mode" to "dark mode".
> That could be done by changing the theme using load-theme; that should
> force a redisplay (I assume it does; I have not checked).
>
> Other than the global face variables, ada-mode does not have any
> variables that control faces. Some other modes may, for example setting
> the level of highlighting to minimal or max. In that case, the font-lock
> regexps change, and the function that does that presumably sets
> fontified to nil in the current buffer, and should also force redisplay.
> If ada-mode adds a feature like this, there will be a function to change
> it (perhaps a custom variable change function) that also forces a
> reparse and redisplay.
>
> > I can understand why fontification methods that are too slow want to
> > get some help from hooks, but when we design and implement novel
> > fontification methods using fast parsers, we should first try doing
> > that without any hooks,
>
> Yes, premature optimization is evil. Using tree-sitter to implement
> font-lock should start by always parsing the whole buffer for every call
> of fontify-region. If that is fast enough, we're done. If not, we can
> consider whether parsing a smaller part of the buffer is possible.
>
> Note that the fact that tree-sitter provides incremental parse is a
> strong hint that the answer will be "it's not fast enough".
>
> >> >> By default font-lock runs after every character typed
> >> >
> >> > No, it only runs when redisplay kicks in.  If you type very quickly,
> >> > it won't run for every character.  At least AFAIR.
> >>
> >> What triggers redisplay?
> >
> > When Emacs is about to read input, if no input is available, it
> > performs redisplay.  IOW, Emacs enters redisplay when it's about to
> > become idle.
> >
> <snip>
> >> The elisp manual section "Forcing redisplay" says "Emacs normally tries
> >> to redisplay the screen whenever it waits for input." After I type the
> >> first character, it is no longer waiting for input, it is processing
> >> that character. I assume here "process that char code" includes running
> >> after-change-functions, which is (small) elisp code. But I guess after
> >> processing that char, before calling redisplay, it checks if there is
> >> more input, which should be true if I type fast enough. Perhaps "process
> >> that char code" is faster than the combination of my fingers and the
> >> keyboard char send rate?
> >
> > Yes, most probably.
>
> Ok, so in practice, it is not possible to type fast enough, and
> font-lock runs after every character typed.
>
> > In other similar situations (e.g., in Flyspell mode) we wait for some
> > non-zero idle time before actually running the code which could react
> > to slow typing with annoying messages.
>
> Since font-lock is running a parser, it detects syntax errors. I
> could delay the display of the fringe mark, without delaying font-lock
> itself. I'll put that on my list.
>
> --
> -- Stephe
>
>