From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail
From: Dmitry Gutov <dgutov@yandex.ru>
Newsgroups: gmane.emacs.devel
Subject: Re: emacs rendering comparisson between emacs23 and emacs26.3
Date: Thu, 16 Apr 2020 04:43:23 +0300
Message-ID: <6d65d90c-178e-87e2-68dd-236275a5e038@yandex.ru>
References: <83r1x1sqkx.fsf@gnu.org>
 <c60ad734-cee1-a40b-1027-e4575799d161@yandex.ru> <83lfn9s63n.fsf@gnu.org>
 <c73564b8-f6af-5c61-5fe6-4fa142010323@yandex.ru> <83h7xvqsgc.fsf@gnu.org>
 <90749329-ccb1-f96e-29c0-b4ecbb81d1d4@yandex.ru> <20200407174217.GC4009@ACM>
 <50acd968-4459-2fab-1609-7869e1ed072a@yandex.ru> <20200408020913.GA3992@ACM>
 <a8eb7e65-c5c8-ce55-68af-c27965d02c5c@yandex.ru> <20200412153458.GA5249@ACM>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202";
	logging-data="4685"; mail-complaints-to="usenet@ciao.gmane.io"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.4.1
Cc: rudalics@gmx.at, Eli Zaretskii <eliz@gnu.org>, rrandresf@gmail.com,
 rms@gnu.org, emacs-devel@gnu.org
To: Alan Mackenzie <acm@muc.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 16 03:44:06 2020
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jOtZW-00017a-CO
	for ged-emacs-devel@m.gmane-mx.org; Thu, 16 Apr 2020 03:44:06 +0200
Original-Received: from localhost ([::1]:56920 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1jOtZV-0001gv-FM
	for ged-emacs-devel@m.gmane-mx.org; Wed, 15 Apr 2020 21:44:05 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39663)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <raaahh@gmail.com>) id 1jOtYy-0001I9-Ao
 for emacs-devel@gnu.org; Wed, 15 Apr 2020 21:43:33 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <raaahh@gmail.com>) id 1jOtYu-0002AD-SO
 for emacs-devel@gnu.org; Wed, 15 Apr 2020 21:43:32 -0400
Original-Received: from mail-wr1-x429.google.com ([2a00:1450:4864:20::429]:42912)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <raaahh@gmail.com>)
 id 1jOtYu-00029o-3n; Wed, 15 Apr 2020 21:43:28 -0400
Original-Received: by mail-wr1-x429.google.com with SMTP id j2so2934480wrs.9;
 Wed, 15 Apr 2020 18:43:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=sender:subject:to:cc:references:from:message-id:date:user-agent
 :mime-version:in-reply-to:content-language:content-transfer-encoding;
 bh=hc5mIgjGA5AghyqEwpSbfKB5SpCVDh4RiFPxdd0ncoM=;
 b=sErbEQeMupK4IJclT58v4WIjD/OcwV88KKekswyYUSyuxn+qH4J4jHbqrW9apx2UiZ
 tgXCtdkDg0EAe2RMDi+KTNNCEUk025ENfevvFRXPQW2qfKHslXSa7NzrM1wR2PZZp+Gj
 v+kD4kRpiMVxUVKbcBBjFWQf2zWVIqhiawn9LWLfe/11uKL1XcbigIG193D4OgA64lBm
 90mRgL9Cesh5mw5qBbsPx5W1EHBQcX4TD7yVYp+Bjm3GLdeP4PZeV+UOaPxnsfF2f875
 DE+JBdMzsNJy4YVoUNX8ERXdVElRQsuZbtMGjMwuzvnkx83UCNfajNRDcacTM4/YVaaM
 VATw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:subject:to:cc:references:from:message-id
 :date:user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=hc5mIgjGA5AghyqEwpSbfKB5SpCVDh4RiFPxdd0ncoM=;
 b=mJjRkcNcwuKfLvcxWrwaoI3/lLtgoUfPUvX/duff/P4mI9xdgTQkTKooeL67OgU/I6
 8syEK3OO9qWvw4sRynIihcGdUruUJzJUcW3evO5FPnSVvo/CBeF+XPUIhgmoHw3vPR2k
 7iQtRZ5gaYfpYjPFL+YTdN1QqPt2arVq8892CCUjF0pqSh+ZIyZO2sEB1zgGlaUeh31m
 H1R/x+c+xj38MjyG7xglWTcNqFDKBWy5xupE32Pzar2/jZdQ3cMVTPu5iZlJSsXhVcn9
 8TUKsMRcX88W41EHNKPdCLMcOfVc6F7jmcEXc4Zj06XXYGkyyATYSQN8dMMYjtxW6axU
 QEjg==
X-Gm-Message-State: AGi0PuaXF7uf1cT/zqYQVPHiZ5NUCpOjtxq/S+Xqjzh3oq1OWmMvUTFV
 TBandh+0UKiseomLe7wMIGSaaAywi+o=
X-Google-Smtp-Source: APiQypIhX8ArgqeiIM5QwE6PwiFx/QEopxuEtJw+j4fZGBKyQcfRS9GKzAHOPVF4OB3cIjodmSaHLQ==
X-Received: by 2002:adf:ff84:: with SMTP id j4mr304824wrr.305.1587001406355;
 Wed, 15 Apr 2020 18:43:26 -0700 (PDT)
Original-Received: from [192.168.0.2] ([66.205.73.129])
 by smtp.googlemail.com with ESMTPSA id k14sm26131722wrp.53.2020.04.15.18.43.24
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 15 Apr 2020 18:43:25 -0700 (PDT)
In-Reply-To: <20200412153458.GA5249@ACM>
Content-Language: en-US
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2a00:1450:4864:20::429
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:247066
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/247066>

Hi Alan,

On 12.04.2020 18:34, Alan Mackenzie wrote:
>>> But this would merely transfer the start up time to the
>>> time taken in early scrolls forward.
> 
>> Not really. The start up scans the whole buffer, doesn't it? The early
>> scrolls forward would still scan only a fraction of it.
> 
> I'm thinking more about "scrolling" to the function in the file that one
> wants to work on or look at.  On average, this will be a little more
> than half way through the file (there is often a large comment block at
> BOB).  So you'd only be saving about half of CC Mode's start-up scan.

Yes, on average it's only 2x benefit, but then again, if the buffer 
opens at its beginning first, the extra initialization will be spread 
across user commands. Which should have some additional positive effect 
on apparent responsiveness.

>> Inserting characters can alter the syntax state of the whole buffer. At
>> least that's true for some of them. Full buffer scan sounds inevitable
>> in those cases.
> 
> Full buffer scans are very unusual.  Inserting a " where every
> subsequent line ended with a backslash might do that.  Inserting a C++ <
> wouldn't - its effect is limited to up to the next brace or semicolon.
> 
> Inserting a C++ raw string opener does typically necessitate a full
> scan (a search for a matching closer), but that would also be the case
> using syntax-propertize.

Not really. It would just mark the opener as a string opener (maybe with 
some extra text property), and that's that. Then font-lock would fontify 
the following text as string contents (until the end of the window or a 
little bit farther). Then you type the closer, it only has to scan a 
little far back (it'll call syntax-ppss to find the string opener), the 
closer is propertized as appropriate, and that's that. No full buffer 
scans at any step.

I recall that fontifying the rest of the buffer as text after a simple 
string opener could be a sore topic for you, but raw strings should be 
rare enough (aren't they?), or if they are not, fontification logic 
could opt to do something different, while syntax-table properties will 
be applied the "correct" way.

> So, we would merely be moving functions from
> c-get-state-before-change-functions and c-before-font-lock-functions
> (effectively lists of before-/after-change functions) to
> s-p-extend-region-f, together with adaptation. Would you agree that
> such a change to CC Mode would be largely pointless if some of these
> functions had to remain on c-get-state-b-c-f and c-before-f-l-f?

Yes, I think before-change-functions should become empty. Or much emptier.

> But the way s-p-extend-region-f functions are called is to keep calling
> them repeatedly until they've all said "no change" together.  This would
> dramatically slow down CC Mode, where currently these functions are each
> called exactly once.

Here's a simple solution: create one function special for CC Mode that 
would do that, and add it to s-p-extend-region-f.

But there might be even better way to do that. I'm not the best person 
to discuss that with.

> Also, the syntax-propertize mechanism is weaker than CC Mode's: When it
> is run, there is no way of knowing whether it's being called as a change
> function, and if it is, OLD-LEN is discarded.  How can it have access to
> variables set in before-change-functions?  (An example of such is
> c-raw-string-end-delim-disrupted.  In before change, it is set when the
> existing raw string end delimiter is about to cease to be.  In after
> change, the fact of this flag being nil means we don't need to search
> for an alternative closing delimiter, etc.  This change can obviously
> not be detected in an after-change function.)

As we seem to agree, before-change-functions should not be needed. 
Neither should be the code that tracks the details of the edits that the 
user makes. That alone can simplify some things.

>> I'm pretty sure I have thought of that example because it's an instance
>> of a syntax problem that's easy enough to solve within
>> syntax-propertize-function framework.
> 
> Having actually gone through all the issues and implemented raw strings,
> I can't agree with you there.  There are all sorts of subtleties which
> necessitate intimate cooperation between the before-change-functions and
> after-change-functions.  Such cooperation seems to be excluded by the
> syntax-propertize mechanism.

It encourages a different approach. Again: there are examples of raw 
strings support in other major modes.

>>> Consider(2) a C++ template: excusing my C++ syntax knowledge, type in
> 
>>>      template class foo < bar, baz >= bar>
> 
>>> , perhaps typing in the odd newline inside the template (a common
>>> occurrence), or nesting further templates inside it (also a common
>>> occurrence).  Note how the parenthesis text properties are added and
>>> removed as you type.  All these modification are necessary, and they are
>>> largely _before_ the point of insertion, not after it.
> 
>> The current implementation of applying these properties can probably be
>> transferred into a syntax-propertize-function with only modest changes.
> 
> Maybe, but with a slowdown.  More of these properties will get erased
> than needed (with nested template forms), and they will all need to get
> put back again.

We won't really know until we can measure the result.

>>>> Some scenarios can become slower, that's for sure. But the more common
>>>> ones can get faster. We won't know until we try.
> 
> Other than starting up a buffer, we still haven't identified any
> specific scenarios where speed up might happen.

When before-changes-functions only contains (t syntax-ppss-flush-cache), 
that can visibly change the performance tradeoffs.

> I don't think the syntax-propertize mechanism is all that brilliant.
> It's too constrained, and places too many restrictions on what can be
> done with the syntax-table text property.  For example, (from
> syntax.el):

I wouldn't say it's perfect either. But it's proven helpful over the 
years, and provided a base design for a lot of major mode implementations.

> (defvar syntax-propertize-function nil
>    ;; Rather than a -functions hook, this is a -function because it's easier
>    ;; to do a single scan than several scans: with multiple scans, one cannot
>    ;; assume that the text before point has been propertized, so syntax-ppss
>    ;; gives unreliable results (and stores them in its cache to boot, so we'd
>    ;; have to flush that cache between each function, and we couldn't use
>    ;; syntax-ppss-flush-cache since that would not only flush the cache but also
>    ;; reset syntax-propertize--done which should not be done in this case).
> 
>  From my point of view, "multiple scans" are _much_ easier.  They are
> prohibited here only because syntax-ppss and syntax-propertize-function
> have got themselves tied up in a tight knot.  One answer would be not to
> use syntax-ppss inside a s-p-function.  (CC Mode doesn't use syntax-ppss
> at all).  Another answer would be to give the responsibility of removing
> the s-t text properties to the s-p-function.

I think we could extend the customizability in that direction. But first 
we'd have to see clear evidence that the current design is not good 
enough (e.g. CC Mode has been reimplemented on top of it, and the result 
is decidedly not fast enough).

>> Simply collaborating with one other developer on an overhaul project
>> (whether it succeeds or not; perhaps partially) can improve on that.
> 
> But take a massive amount of time.

You could consider it an investment.