From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.devel
Subject: Re: emacs rendering comparisson between emacs23 and emacs26.3
Date: Sun, 12 Apr 2020 15:34:58 +0000
Message-ID: <20200412153458.GA5249@ACM>
References: <83r1x1sqkx.fsf@gnu.org>
 <c60ad734-cee1-a40b-1027-e4575799d161@yandex.ru>
 <83lfn9s63n.fsf@gnu.org>
 <c73564b8-f6af-5c61-5fe6-4fa142010323@yandex.ru>
 <83h7xvqsgc.fsf@gnu.org>
 <90749329-ccb1-f96e-29c0-b4ecbb81d1d4@yandex.ru>
 <20200407174217.GC4009@ACM>
 <50acd968-4459-2fab-1609-7869e1ed072a@yandex.ru>
 <20200408020913.GA3992@ACM>
 <a8eb7e65-c5c8-ce55-68af-c27965d02c5c@yandex.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202";
	logging-data="40138"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: rudalics@gmx.at, Eli Zaretskii <eliz@gnu.org>, rrandresf@gmail.com,
 rms@gnu.org, emacs-devel@gnu.org
To: Dmitry Gutov <dgutov@yandex.ru>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Apr 12 17:35:50 2020
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jNeeD-000AK9-QR
	for ged-emacs-devel@m.gmane-mx.org; Sun, 12 Apr 2020 17:35:49 +0200
Original-Received: from localhost ([::1]:34480 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jNeeC-0002fE-QX
	for ged-emacs-devel@m.gmane-mx.org; Sun, 12 Apr 2020 11:35:48 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38656)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <acm@muc.de>) id 1jNedZ-00026q-Im
 for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:11 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <acm@muc.de>) id 1jNedX-0002Cn-Sb
 for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:09 -0400
Original-Received: from colin.muc.de ([193.149.48.1]:47590 helo=mail.muc.de)
 by eggs.gnu.org with smtp (Exim 4.71) (envelope-from <acm@muc.de>)
 id 1jNedW-00027H-0M
 for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:07 -0400
Original-Received: (qmail 98574 invoked by uid 3782); 12 Apr 2020 15:35:04 -0000
Original-Received: from acm.muc.de (p4FE15845.dip0.t-ipconnect.de [79.225.88.69]) by
 localhost.muc.de (tmda-ofmipd) with ESMTP;
 Sun, 12 Apr 2020 17:34:58 +0200
Original-Received: (qmail 5409 invoked by uid 1000); 12 Apr 2020 15:34:58 -0000
Content-Disposition: inline
In-Reply-To: <a8eb7e65-c5c8-ce55-68af-c27965d02c5c@yandex.ru>
X-Delivery-Agent: TMDA/1.1.12 (Macallan)
X-Primary-Address: acm@muc.de
X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy]
X-Received-From: 193.149.48.1
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:246867
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/246867>

Hello, Dmitry.

On Fri, Apr 10, 2020 at 06:33:13 +0300, Dmitry Gutov wrote:
> Hi Alan,

> On 08.04.2020 05:09, Alan Mackenzie wrote:

> >> I will shut up about it now (saying it twice it plenty), but I am pretty
> >> confident saying that if you manage to migrate to s-p-f, file opening
> >> time will go down.

> > I'm sure it would.  If file opening time were really a concern, a hybrid
> > algorithm would perhaps be the best way: apply the text properties first
> > in a lazy fashion, and thereafter treat them with care, as CC Mode
> > currently does.

> s-p-f would help with the first step, and as for "treating them with 
> care", it would be good for us all to see how much of that is really 
> needed. Improving syntax-propertize is not out of the question, and it 
> might benefit several major modes, not just the CC collection.

> > But this would merely transfer the start up time to the
> > time taken in early scrolls forward.

> Not really. The start up scans the whole buffer, doesn't it? The early 
> scrolls forward would still scan only a fraction of it.

I'm thinking more about "scrolling" to the function in the file that one
wants to work on or look at.  On average, this will be a little more
than half way through the file (there is often a large comment block at
BOB).  So you'd only be saving about half of CC Mode's start-up scan.

> >> Performance while typing is likely to improve too, at least when the
> >> same buffer is not shown in another window, many thousand lines later.

> > What makes you think this?

> Inserting characters can alter the syntax state of the whole buffer. At 
> least that's true for some of them. Full buffer scan sounds inevitable 
> in those cases.

Full buffer scans are very unusual.  Inserting a " where every
subsequent line ended with a backslash might do that.  Inserting a C++ <
wouldn't - its effect is limited to up to the next brace or semicolon.

Inserting a C++ raw string opener does typically necessitate a full
scan (a search for a matching closer), but that would also be the case
using syntax-propertize.

> >> "Considerable enhancement" can also be a part of that discussion.

> > The syntax-propertize-function mechanism works by erasing ALL
> > syntax-table properties after a change point, then reapplying them
> > lazily.

> That's not right. It only erases syntax-table properties in a chunk 
> before calling syntax-propertize-function on the same range of 
> positions. IOW, is overwrites them lazily as well.

Sorry, I was mistaken there.  The bounds for erasing and re-applying the
s-t props are determined (except in simple cases) by
syntax-propertize-extend-region-functions.

So, we would merely be moving functions from
c-get-state-before-change-functions and c-before-font-lock-functions
(effectively lists of before-/after-change functions) to
s-p-extend-region-f, together with adaptation.  Would you agree that
such a change to CC Mode would be largely pointless if some of these
functions had to remain on c-get-state-b-c-f and c-before-f-l-f?

But the way s-p-extend-region-f functions are called is to keep calling
them repeatedly until they've all said "no change" together.  This would
dramatically slow down CC Mode, where currently these functions are each
called exactly once.

Also, the syntax-propertize mechanism is weaker than CC Mode's: When it
is run, there is no way of knowing whether it's being called as a change
function, and if it is, OLD-LEN is discarded.  How can it have access to
variables set in before-change-functions?  (An example of such is
c-raw-string-end-delim-disrupted.  In before change, it is set when the
existing raw string end delimiter is about to cease to be.  In after
change, the fact of this flag being nil means we don't need to search
for an alternative closing delimiter, etc.  This change can obviously
not be detected in an after-change function.)

>  > Considering that s-t properties have an overwhelmingly local
>  > effect, this is very wasteful of processor time.

> It would have been. As you can see, it's not a difficult problem to fix, 
> even if it were still present.

The lack of full information (see above) in the syntax-propertize
mechanism is a problem.

> > Consider, for example, editing within a large C++ raw string, a common
> > occurrence.  You yourself reported as a bug sluggish performance here in
> > mid 2016.  The cause was erasing too many s-t text properties at a
> > buffer change.  I think we were talking about 1 second per typed
> > character in the scenario you gave.  There are typically lots of these
> > properties in a raw string, in particular on " characters.

> I'm pretty sure I have thought of that example because it's an instance 
> of a syntax problem that's easy enough to solve within 
> syntax-propertize-function framework.

Having actually gone through all the issues and implemented raw strings,
I can't agree with you there.  There are all sorts of subtleties which
necessitate intimate cooperation between the before-change-functions and
after-change-functions.  Such cooperation seems to be excluded by the
syntax-propertize mechanism.

> > Consider(2) a C++ template: excusing my C++ syntax knowledge, type in

> >     template class foo < bar, baz >= bar>

> > , perhaps typing in the odd newline inside the template (a common
> > occurrence), or nesting further templates inside it (also a common
> > occurrence).  Note how the parenthesis text properties are added and
> > removed as you type.  All these modification are necessary, and they are
> > largely _before_ the point of insertion, not after it.

> The current implementation of applying these properties can probably be 
> transferred into a syntax-propertize-function with only modest changes.

Maybe, but with a slowdown.  More of these properties will get erased
than needed (with nested template forms), and they will all need to get
put back again.

> >> Some scenarios can become slower, that's for sure. But the more common
> >> ones can get faster. We won't know until we try.

Other than starting up a buffer, we still haven't identified any
specific scenarios where speed up might happen.

> > Trying would be a _lot_ of work.  How is one to handle the common
> > example scenarios above?

> Stefan has offered to help. And I'm sure he could answer the follow-up 
> questions much better than I.

I've tried quite a few optimisations over the years.  Some have been
successful, but all too often I've put in a lot of work, then at the end
of it the profiler tells me It's just been a waste of time.  I strongly
suspect that that would be the result here, too.

> > Well, you'd have to enhance the syntax-propertize-function with a
> > means of determining a start position for erasing s-t props, and
> > also a stop position.

> The real-world uses of s-p-f out there already solve syntax problems of 
> comparable complexity. And move the start position, among other things.

OK.  I was mistaken there.

> > Once you do that, you're effectively doing what CC Mode currently
> > does, so where's the speed advantage coming from?

>  From doing things more lazily, is how I see it. But I'm not an expert 
> on CC Mode architecture.

> Among other benefits, moving it to a standard-ish framework like s-p-f 
> could (possibly) simplify its code, as well as make it more approachable 
> for other developers already familiar with how most other major modes 
> are written. So far I wouldn't even know where to start fixing bugs in 
> it, and IMHO CC Mode currently has bus factor = 1. It's not great for 
> its future. I suspect it's not ideal for you either.

I don't think the syntax-propertize mechanism is all that brilliant.
It's too constrained, and places too many restrictions on what can be
done with the syntax-table text property.  For example, (from
syntax.el):

(defvar syntax-propertize-function nil
  ;; Rather than a -functions hook, this is a -function because it's easier
  ;; to do a single scan than several scans: with multiple scans, one cannot
  ;; assume that the text before point has been propertized, so syntax-ppss
  ;; gives unreliable results (and stores them in its cache to boot, so we'd
  ;; have to flush that cache between each function, and we couldn't use
  ;; syntax-ppss-flush-cache since that would not only flush the cache but also
  ;; reset syntax-propertize--done which should not be done in this case).

>From my point of view, "multiple scans" are _much_ easier.  They are
prohibited here only because syntax-ppss and syntax-propertize-function
have got themselves tied up in a tight knot.  One answer would be not to
use syntax-ppss inside a s-p-function.  (CC Mode doesn't use syntax-ppss
at all).  Another answer would be to give the responsibility of removing
the s-t text properties to the s-p-function.

> Simply collaborating with one other developer on an overhaul project 
> (whether it succeeds or not; perhaps partially) can improve on that.

But take a massive amount of time.

>    Cheers.

-- 
Alan Mackenzie (Nuremberg, Germany).