From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: emacs rendering comparisson between emacs23 and emacs26.3 Date: Mon, 20 Apr 2020 21:19:41 +0000 Message-ID: <20200420211941.GC8796@ACM> References: <83h7xvqsgc.fsf@gnu.org> <90749329-ccb1-f96e-29c0-b4ecbb81d1d4@yandex.ru> <20200407174217.GC4009@ACM> <50acd968-4459-2fab-1609-7869e1ed072a@yandex.ru> <20200408020913.GA3992@ACM> <20200412153458.GA5249@ACM> <6d65d90c-178e-87e2-68dd-236275a5e038@yandex.ru> <20200419171209.GA23044@ACM> <34fc9563-479e-f026-9640-1b70ca9885b9@yandex.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="12494"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rudalics@gmx.at, Eli Zaretskii , rrandresf@gmail.com, rms@gnu.org, emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Apr 20 23:20:35 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jQdqE-0003Ay-Sr for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Apr 2020 23:20:34 +0200 Original-Received: from localhost ([::1]:42526 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jQdqD-0005Jq-QZ for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Apr 2020 17:20:33 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41708) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jQdpT-0004r2-LU for emacs-devel@gnu.org; Mon, 20 Apr 2020 17:19:48 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jQdpS-00035O-D1 for emacs-devel@gnu.org; Mon, 20 Apr 2020 17:19:47 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:46716 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.90_1) (envelope-from ) id 1jQdpR-0002u2-1b for emacs-devel@gnu.org; Mon, 20 Apr 2020 17:19:46 -0400 Original-Received: (qmail 88252 invoked by uid 3782); 20 Apr 2020 21:19:42 -0000 Original-Received: from acm.muc.de (p4FE15946.dip0.t-ipconnect.de [79.225.89.70]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Mon, 20 Apr 2020 23:19:41 +0200 Original-Received: (qmail 9121 invoked by uid 1000); 20 Apr 2020 21:19:41 -0000 Content-Disposition: inline In-Reply-To: <34fc9563-479e-f026-9640-1b70ca9885b9@yandex.ru> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de Received-SPF: pass client-ip=193.149.48.1; envelope-from=acm@muc.de; helo=mail.muc.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/20 15:51:32 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:247424 Archived-At: Hello, Dmitry. On Sun, Apr 19, 2020 at 21:12:57 +0300, Dmitry Gutov wrote: > On 19.04.2020 20:12, Alan Mackenzie wrote: [ .... ] > >>> Inserting a C++ raw string opener does typically necessitate a full > >>> scan (a search for a matching closer), but that would also be the > >>> case using syntax-propertize. > >> Not really. It would just mark the opener as a string opener (maybe with > >> some extra text property), and that's that. > > You don't know whether it's an unterminated raw string (the usual case) > > until you've scanned for a potential closing delimiter. > Is C++ syntax so ambiguous? Can R"( mean something else? C++ syntax is ambiguous in several places, but not here. R"foo(, assuming it's not itself inside a literal, means exactly one thing. But you don't know whether it starts an unterminated string unless you seach for a closing delimiter. > > This affects the font locking. (An unterminated opening delimiter > > gets font-lock-warning-face, a terminated one doesn't.) > If everything after R"( is fontified as a string, it serves as a > "warning" of sort as well. That means putting syntax-table properties on every " after the R"foo(, otherwise that string would only extend to the first such ". But it wouldn't work well, since it wouldn't tell the user whether her string was already closed. > > This is the sort of feature which I'm not willing to sacrifice. > Is it worth a full buffer scan every time you write a new raw string > literal? Yes, definitely. > >> Then font-lock would fontify the following text as string contents > >> (until the end of the window or a little bit farther). Then you type > >> the closer, it only has to scan a little far back (it'll call > >> syntax-ppss to find the string opener), the closer is propertized as > >> appropriate, and that's that. No full buffer scans at any step. > >> I recall that fontifying the rest of the buffer as text after a simple > >> string opener could be a sore topic for you, but raw strings should be > >> rare enough (aren't they?), or if they are not, fontification logic > >> could opt to do something different, while syntax-table properties will > >> be applied the "correct" way. > > I'm not sure what you mean by "as text". > Sorry, I meant "as string". OK. Same procedure for a simple string - if it's a terminated string the " gets font-lock-string-face, if it's not it gets f-l-warning face. > > I've no reason to think raw strings are at all rare. I've had > > several bug reports for them. I'm not sure what you mean by > > "fontification logic ... something different" - do you mean in the > > raw string case? > I mean that if a raw string is unterminated, the default behavior should > be to fontify the rest of the buffer as string. But then again, you > could choose some different highliting in font-lock rules. The current strategy is to fontify the unterminated R"foo( with warning-face, and let the devil deal with the rest of the string (i.e. no attempt is made to apply syntax-table properties). The first portion of the raw string will indeed get string-face. As soon as the closing delimiter is typed, the warning-face is removed from the opener and syntax-table text properties applied throughout the string. The entire string then gets string-face. [ .... ] > >> Yes, I think before-change-functions should become empty. Or much > >> emptier. > > It can't become empty. after-change-functions is fine for dealing > > with insertions, but can't do much after a deletion. Consider the > > case where you're in a string and all you know is that 5 characters > > have been deleted. Those characters might have been )foo", so after > > checking the beginning of the string starts off with R"bar(, you've > > then got to scan a long way forward looking for )bar". Effectively > > every deletion within a string would involve scanning to the end of > > that string. > This is an example of extra complexity you have to retain to implement > the above feature the way you do. It will become more complex and slower, if information from before-change-functions is ignored, or discarded. The alternative is, after each deletion, to scan forward checking that the terminating delimiter still exists. This is slower and more complicated than checking in b-c-f whether it's about to be removed. > It's probably also an example of how before/after-change-functions > essentially duplicate the knowledge of language syntax. I'm guessing > here, but to make it work like that, you need to have multiple functions > "understand" the raw string syntax. b/a-c-f implement the language syntax. It's one of the places the language is codified. The mechanism is in several functions, yes. If you're interested, go into cc-engine.el and search for "raw string". > Whereas with syntax-propertize-function, that knowledge is concentrated > in one place (maybe two, if font-lock rules do something unusual). This > way, the code is simplified. No, it gets complicated, assuming no loss of functionality. A given amount of functionality would get squashed into a smaller place. The current implementation (of C++ raw strings) is optimised such that normal insertion and deletion don't cause the s-t properties on the entire string to be modified. That requires details of the buffer before the insertions and deletions. [ .... ] > >>> Maybe, but with a slowdown. More of these properties will get erased > >>> than needed (with nested template forms), and they will all need to get > >>> put back again. > >> We won't really know until we can measure the result. > > What's the point in investing all the effort to make the change, when > > there's not even a prediction of a speed up? > In principle, the speed-up will come from: > - Deferred execution (where several buffer changes can be handled > together and not right away), I've never been wholly convinced by laziness. Sooner or later these changes need to be handled, and delaying them is not going to accelerate them. > - No parsing the buffer contents much farther than the current window, > in most cases. Which can speed up the majority of user actions. The > exceptions will remain slower, but that is often a good tradeoff. This will involve loss of functionality, as already noted. And bugs; whilst typing in normal text, CC Mode has to search backwards for a safe place, otherwise context fontification can mess things up. This is an area where optimisation would be useful. > > And I'm not sure where the proof of the syntax-propertize mechanism > > being helpful is. Has anybody but its originator positively chosen > > to use it, whilst being aware of the alternatives? > The alternatives being reinventing the relevant logic from zero in each > major mode? And writing syntax caching logic each time? Or writing and using a better framework. The question remains: has anybody other than Stefan M. freely chosen to use syntax-ppss and syntax-propertize-function, whilst being aware of their disadvantages and of alternatives? Remember, that for an extended period of time syntax-ppss didn't work properly, and even now it doesn't do the right thing in narrowed buffers, at least for a programming mode such as CC Mode. The syntax-propertize mechanism erases s-t p's in a manner not under the control of the major mode, which means the major mode needs to implement workarounds (which are liable to be slow). > > To become usable for CC Mode, it would need to provide something on > > before-change-functions to complement what's on a-c-f, and it would need > > to provide some control to the major mode over which syntax-table > > properties get erased. > Not something I can comment on. Hmmm. -- Alan Mackenzie (Nuremberg, Germany).