On Wed, Aug 10, 2022, 1:44 PM Eli Zaretskii <eliz@gnu.org> wrote:

>
> Really?  Then please tell me how is it that we the humans can detect
> incorrect fontifications even when shown partial strings and comments?
> We know that fontifications are incorrect, and where strings or
> comments start or end immediately, just after a single glance.  We
> never need to go to BOB to find that out.


Serious question: is fontification intended to display text according to
what the author probably intended, or according to how a compiler will
process that text (leaving correctness to a more precise tool than
font-lock, whether Semantic, tree-sitter, LSP, whatever)?
Because I can definitely write code that has some subtle issue that I will
miss, and erroneously think should display one way but which would be
processed in a different way.  Should fontification show my likely
intention (plus, and only for bonus points, possibly highlight the error
that disconnects the likely intended from the actual parse), or should it
display according to the way the tools will interpret it so the author will
find errors that way?

When I use a dedicated IDE of recent vintage, it feels  less like I am
writing a stream of characters than filling in partially constructed
objects representing the abstract syntax of the language I'm writing in
(with grammar that has allowances for incomplete or erroneous constructs),
with the text being displayed as a representation of the underlying
object.  IOW, the relationship of the syntactic object and the text is
inverted compared to emacs's design, where (if I understand correctly) the
properties of the syntactic object are only tied to the text through text
properties.  With the other approach, the fontification and the syntax
object are tied together, but with emacs the relationship seems much more
tenuous. E.g. completion and fontification are completely separate
activities as far as I know, though the same contextual information should
be useful for both activities.

I have this CC-mode derived mode for a DSL I did not design.  I'm currently
the sole user of the mode, so I just wanted something quick and dirty.  But
as the pile of code I deal with in this DSL grows, I want to put in
Semantic support for it to get context-aware completion, precise
fontification, etc.  The current discussion has made me wonder if deriving
from CC mode is having some non-obvious effects on how font-lock works,
making it non-local in ways that are not necessary, so the re-entrant
nature of the Semantic parsers won't cure some of the slowness.  For
example, I want to use the font-lock of that mode in the REPL to fontify
the statements/expressions I enter at the prompt, but otherwise ignore
text.  Particularly, at the end and the beginning of the REPL buffer.  I
don't want to narrow the buffer, just the area fontification applies to.
Fontifying hundreds of megabytes of tracing print statements is not just
unnecessary, it's bad news for the GC even after the buffer is cleared IME.

If CC mode is determining more syntactic information than tree-sitter's
incremental parsing provides (per Immanuel Lizroth's comment in this
thread), then there is a disconnect somewhere in the scope of expectations
for what font-lock is supposed to do.  I'm certainly not clear (yet) on how
to cleanly separate and then rejoin a proper syntactic analysis with
fontification, and if there is "an Emacs way" to do it.

Lynn