From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: emacs-tree-sitter and Emacs
Date: Thu, 02 Apr 2020 17:03:43 +0300
Message-ID: <83v9mix9vk.fsf@gnu.org>
References: <CAAxewyhxQek3K7yjSPXm3-7VLxLu92HpZOJKGKcL2eFbePg3dQ@mail.gmail.com>
 <83eeta3sa0.fsf@gnu.org> <86369ojbig.fsf@stephe-leake.org>
 <83lfnfz6jr.fsf@gnu.org> <864ku3htmb.fsf@stephe-leake.org>
Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202";
	logging-data="42711"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: emacs-devel@gnu.org
To: Stephen Leake <stephen_leake@stephe-leake.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 02 16:04:40 2020
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jK0SV-000Ayy-Pw
	for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 16:04:39 +0200
Original-Received: from localhost ([::1]:40530 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jK0SU-0006o8-RF
	for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 10:04:38 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54261)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <eliz@gnu.org>) id 1jK0Rt-0006Gr-Ic
 for emacs-devel@gnu.org; Thu, 02 Apr 2020 10:04:03 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:57277)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
 id 1jK0Rs-0007LO-Ri; Thu, 02 Apr 2020 10:04:00 -0400
Original-Received: from [176.228.60.248] (port=4836 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <eliz@gnu.org>)
 id 1jK0Rs-0003Zf-1x; Thu, 02 Apr 2020 10:04:00 -0400
In-Reply-To: <864ku3htmb.fsf@stephe-leake.org> (message from Stephen Leake on
 Wed, 01 Apr 2020 11:51:40 -0800)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:246257
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/246257>

> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Wed, 01 Apr 2020 11:51:40 -0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Can you tell in more detail why you need to rely on these hooks?  They
> > shouldn't be necessary, AFAIU.  
> 
> It is an optimization choice.
> 
> In an unmodified buffer, that is smaller than 100,000 characters
> (default setting of wisi-partial-parse-threshold), the entire buffer is
> parsed once; that applies faces to all the Ada identifiers that need
> faces (standard font-lock regexp handles the reserved words). Then when
> font-lock fontifies a region, no parsing is needed.

But why do you need that initial full parse in the first place?  Is
parsing parts of the buffer so much harder?

> Indent is similar; the parse sets text properties holding the indent for
> each line; indent-region then applies them.

Indent is a different use case: it happens by user command, and thus
has different time restrictions than redisplay.

> If the default setting of jit-lock-defer-time (ie nil) is used, then
> font-lock runs immediately after each change, and the after-change hooks
> are not needed. But as I have mentioned, I always run with
> jit-lock-defer-time set to 1.0 (because parsing is not fast enough in
> some cases), so the change hooks are needed.

AFAIU, tree-sitter and similar parsers are supposed to be much faster,
so the problem with slow parsing, and all the solutions to alleviate
that problem, may not be necessary, if they are the only reason for
using the hooks.

> The alternative to not requiring after-change hooks is to always do a full
> parse, for ever call of fontify-region or indent-region. That is far too
> slow.

Even for indentation, a full parse should not be needed.  You need to
only parse the outermost enclosing function/procedure, right?  That's
rarely the full buffer, except when the buffer is small.

> Note that Tree-Sitter requires one full parse of the buffer to generate
> the parse tree that is later updated incrementally; in an unmodified
> buffer, only that one parse is needed.

Tree-sitter cannot know what the full buffer holds, so nothing
prevents us from passing it just part of the buffer.  After all,
tree-sitter should be able to do a decent job when the part we pass to
it actually _is_ all we have in the buffer, right?

> > And they cannot pick up every relevant change; for example, what
> > happens if some face used for font-lock is modified?
> 
> Yes, that is a flaw. Not likely to occur in everyday use

Redisplay cannot rely on something being "unlikely", because it's
expected to produce correct results in all situations.  Incorrect
display is one of the worst bugs that can happen in an editor.  In a
modified buffer that is not yet syntactically correct we can get away
with slightly incorrect fontifications, but missing face changes will
produce horribly incorrect results even if nothing has changed
syntactically.

That is why I think we should try to avoid using hooks for
fontification as much as we can.  I can understand why fontification
methods that are too slow want to get some help from hooks, but when
we design and implement novel fontification methods using fast
parsers, we should first try doing that without any hooks, because we
already know, from the bitter experience of Emacs 19, that using hooks
is a dead end.  We developed jit-lock in Emacs 21 precisely to avoid
using such hooks, because we realized that those old methods won't
work well enough.

> >> By default font-lock runs after every character typed
> >
> > No, it only runs when redisplay kicks in.  If you type very quickly,
> > it won't run for every character.  At least AFAIR.
> 
> What triggers redisplay?

When Emacs is about to read input, if no input is available, it
performs redisplay.  IOW, Emacs enters redisplay when it's about to
become idle.

> In practice, I and other ada-mode users notice font-lock running after
> each character, with the default setting of jit-lock-defer-time. There
> is a comment in jit-lock.el indicating that the default value may have
> been 0.25 at one point (I did not check the git history); perhaps you
> are remembering that behavior?

The 0.25 value is just a reminder of the default timing of a similar
feature in lazy-lock (RIP), used in Emacs 19.  AFAIR, we never had
jit-lock-defer-time non-nil by default in Emacs, because during
development of Emacs 21 the consensus was that its effect is too
surprising, and because (at least in those days) the default jit-lock
was fast enough for us to be able to leave the deferred fontification
disabled.

> For example, in Ada the comment-start is "--". No matter how fast I type
> the two chars, ada-mode reports a syntax error after the first one.

That means you don't type fast enough, at least relative to your CPU
speed.

> I don't think there's anything in ada-mode that forces a redisplay
> (except explicitly calling wisi-parse-buffer; that calls
> font-lock-ensure). But I'd be happy to investigate further if you are
> sure it should not work this way.

In other similar situations (e.g., in Flyspell mode) we wait for some
non-zero idle time before actually running the code which could react
to slow typing with annoying messages.

> The elisp manual section "Forcing redisplay" says "Emacs normally tries
> to redisplay the screen whenever it waits for input." After I type the
> first character, it is no longer waiting for input, it is processing
> that character. I assume here "process that char code" includes running
> after-change-functions, which is (small) elisp code. But I guess after
> processing that char, before calling redisplay, it checks if there is
> more input, which should be true if I type fast enough. Perhaps "process
> that char code" is faster than the combination of my fingers and the
> keyboard char send rate?

Yes, most probably.

> Hmm. M-x (execute-kbd-macro "--") does not show a syntax-error fringe
> blink. I'm not sure if that is relevant here.

I think it is, because it injects the characters through the same
input queue as when you type.  It just does that much faster.

> I mentioned above that the parser is only too slow when there is a bad
> syntax error, and recover is slow. However, that is the typical case
> while editing code. 

AFAIU, producing reasonably good results in this case is one of the
explicit design goals of tree-sitter.  So it might be much better in
these situations.  But I have no first-hand experience to tell if
that's indeed so.