From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: emacs rendering comparisson between emacs23 and emacs26.3 Date: Sun, 12 Apr 2020 15:34:58 +0000 Message-ID: <20200412153458.GA5249@ACM> References: <83r1x1sqkx.fsf@gnu.org> <83lfn9s63n.fsf@gnu.org> <83h7xvqsgc.fsf@gnu.org> <90749329-ccb1-f96e-29c0-b4ecbb81d1d4@yandex.ru> <20200407174217.GC4009@ACM> <50acd968-4459-2fab-1609-7869e1ed072a@yandex.ru> <20200408020913.GA3992@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="40138"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rudalics@gmx.at, Eli Zaretskii , rrandresf@gmail.com, rms@gnu.org, emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Apr 12 17:35:50 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jNeeD-000AK9-QR for ged-emacs-devel@m.gmane-mx.org; Sun, 12 Apr 2020 17:35:49 +0200 Original-Received: from localhost ([::1]:34480 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jNeeC-0002fE-QX for ged-emacs-devel@m.gmane-mx.org; Sun, 12 Apr 2020 11:35:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38656) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jNedZ-00026q-Im for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jNedX-0002Cn-Sb for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:09 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:47590 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1jNedW-00027H-0M for emacs-devel@gnu.org; Sun, 12 Apr 2020 11:35:07 -0400 Original-Received: (qmail 98574 invoked by uid 3782); 12 Apr 2020 15:35:04 -0000 Original-Received: from acm.muc.de (p4FE15845.dip0.t-ipconnect.de [79.225.88.69]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Sun, 12 Apr 2020 17:34:58 +0200 Original-Received: (qmail 5409 invoked by uid 1000); 12 Apr 2020 15:34:58 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246867 Archived-At: Hello, Dmitry. On Fri, Apr 10, 2020 at 06:33:13 +0300, Dmitry Gutov wrote: > Hi Alan, > On 08.04.2020 05:09, Alan Mackenzie wrote: > >> I will shut up about it now (saying it twice it plenty), but I am pretty > >> confident saying that if you manage to migrate to s-p-f, file opening > >> time will go down. > > I'm sure it would. If file opening time were really a concern, a hybrid > > algorithm would perhaps be the best way: apply the text properties first > > in a lazy fashion, and thereafter treat them with care, as CC Mode > > currently does. > s-p-f would help with the first step, and as for "treating them with > care", it would be good for us all to see how much of that is really > needed. Improving syntax-propertize is not out of the question, and it > might benefit several major modes, not just the CC collection. > > But this would merely transfer the start up time to the > > time taken in early scrolls forward. > Not really. The start up scans the whole buffer, doesn't it? The early > scrolls forward would still scan only a fraction of it. I'm thinking more about "scrolling" to the function in the file that one wants to work on or look at. On average, this will be a little more than half way through the file (there is often a large comment block at BOB). So you'd only be saving about half of CC Mode's start-up scan. > >> Performance while typing is likely to improve too, at least when the > >> same buffer is not shown in another window, many thousand lines later. > > What makes you think this? > Inserting characters can alter the syntax state of the whole buffer. At > least that's true for some of them. Full buffer scan sounds inevitable > in those cases. Full buffer scans are very unusual. Inserting a " where every subsequent line ended with a backslash might do that. Inserting a C++ < wouldn't - its effect is limited to up to the next brace or semicolon. Inserting a C++ raw string opener does typically necessitate a full scan (a search for a matching closer), but that would also be the case using syntax-propertize. > >> "Considerable enhancement" can also be a part of that discussion. > > The syntax-propertize-function mechanism works by erasing ALL > > syntax-table properties after a change point, then reapplying them > > lazily. > That's not right. It only erases syntax-table properties in a chunk > before calling syntax-propertize-function on the same range of > positions. IOW, is overwrites them lazily as well. Sorry, I was mistaken there. The bounds for erasing and re-applying the s-t props are determined (except in simple cases) by syntax-propertize-extend-region-functions. So, we would merely be moving functions from c-get-state-before-change-functions and c-before-font-lock-functions (effectively lists of before-/after-change functions) to s-p-extend-region-f, together with adaptation. Would you agree that such a change to CC Mode would be largely pointless if some of these functions had to remain on c-get-state-b-c-f and c-before-f-l-f? But the way s-p-extend-region-f functions are called is to keep calling them repeatedly until they've all said "no change" together. This would dramatically slow down CC Mode, where currently these functions are each called exactly once. Also, the syntax-propertize mechanism is weaker than CC Mode's: When it is run, there is no way of knowing whether it's being called as a change function, and if it is, OLD-LEN is discarded. How can it have access to variables set in before-change-functions? (An example of such is c-raw-string-end-delim-disrupted. In before change, it is set when the existing raw string end delimiter is about to cease to be. In after change, the fact of this flag being nil means we don't need to search for an alternative closing delimiter, etc. This change can obviously not be detected in an after-change function.) > > Considering that s-t properties have an overwhelmingly local > > effect, this is very wasteful of processor time. > It would have been. As you can see, it's not a difficult problem to fix, > even if it were still present. The lack of full information (see above) in the syntax-propertize mechanism is a problem. > > Consider, for example, editing within a large C++ raw string, a common > > occurrence. You yourself reported as a bug sluggish performance here in > > mid 2016. The cause was erasing too many s-t text properties at a > > buffer change. I think we were talking about 1 second per typed > > character in the scenario you gave. There are typically lots of these > > properties in a raw string, in particular on " characters. > I'm pretty sure I have thought of that example because it's an instance > of a syntax problem that's easy enough to solve within > syntax-propertize-function framework. Having actually gone through all the issues and implemented raw strings, I can't agree with you there. There are all sorts of subtleties which necessitate intimate cooperation between the before-change-functions and after-change-functions. Such cooperation seems to be excluded by the syntax-propertize mechanism. > > Consider(2) a C++ template: excusing my C++ syntax knowledge, type in > > template class foo < bar, baz >= bar> > > , perhaps typing in the odd newline inside the template (a common > > occurrence), or nesting further templates inside it (also a common > > occurrence). Note how the parenthesis text properties are added and > > removed as you type. All these modification are necessary, and they are > > largely _before_ the point of insertion, not after it. > The current implementation of applying these properties can probably be > transferred into a syntax-propertize-function with only modest changes. Maybe, but with a slowdown. More of these properties will get erased than needed (with nested template forms), and they will all need to get put back again. > >> Some scenarios can become slower, that's for sure. But the more common > >> ones can get faster. We won't know until we try. Other than starting up a buffer, we still haven't identified any specific scenarios where speed up might happen. > > Trying would be a _lot_ of work. How is one to handle the common > > example scenarios above? > Stefan has offered to help. And I'm sure he could answer the follow-up > questions much better than I. I've tried quite a few optimisations over the years. Some have been successful, but all too often I've put in a lot of work, then at the end of it the profiler tells me It's just been a waste of time. I strongly suspect that that would be the result here, too. > > Well, you'd have to enhance the syntax-propertize-function with a > > means of determining a start position for erasing s-t props, and > > also a stop position. > The real-world uses of s-p-f out there already solve syntax problems of > comparable complexity. And move the start position, among other things. OK. I was mistaken there. > > Once you do that, you're effectively doing what CC Mode currently > > does, so where's the speed advantage coming from? > From doing things more lazily, is how I see it. But I'm not an expert > on CC Mode architecture. > Among other benefits, moving it to a standard-ish framework like s-p-f > could (possibly) simplify its code, as well as make it more approachable > for other developers already familiar with how most other major modes > are written. So far I wouldn't even know where to start fixing bugs in > it, and IMHO CC Mode currently has bus factor = 1. It's not great for > its future. I suspect it's not ideal for you either. I don't think the syntax-propertize mechanism is all that brilliant. It's too constrained, and places too many restrictions on what can be done with the syntax-table text property. For example, (from syntax.el): (defvar syntax-propertize-function nil ;; Rather than a -functions hook, this is a -function because it's easier ;; to do a single scan than several scans: with multiple scans, one cannot ;; assume that the text before point has been propertized, so syntax-ppss ;; gives unreliable results (and stores them in its cache to boot, so we'd ;; have to flush that cache between each function, and we couldn't use ;; syntax-ppss-flush-cache since that would not only flush the cache but also ;; reset syntax-propertize--done which should not be done in this case). >From my point of view, "multiple scans" are _much_ easier. They are prohibited here only because syntax-ppss and syntax-propertize-function have got themselves tied up in a tight knot. One answer would be not to use syntax-ppss inside a s-p-function. (CC Mode doesn't use syntax-ppss at all). Another answer would be to give the responsibility of removing the s-t text properties to the s-p-function. > Simply collaborating with one other developer on an overhaul project > (whether it succeeds or not; perhaps partially) can improve on that. But take a massive amount of time. > Cheers. -- Alan Mackenzie (Nuremberg, Germany).