From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: [Emacs-diffs] comment-cache 223d16f 2/3: Apply `comment-depth' text properties when calling `back_comment'. Date: Mon, 14 Mar 2016 17:29:40 +0000 Message-ID: <20160314172940.GG1894@acm.fritz.box> References: <20160309174816.GE3948@acm.fritz.box> <56E0805F.3050804@gmx.at> <20160312170839.GE2572@acm.fritz.box> <20160312215839.GC10781@acm.fritz.box> <20160313175922.GE1871@acm.fritz.box> <0ce1b5a5-6892-47ad-03d4-d4c2ba2bea54@yandex.ru> <20160314122330.GC1894@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1457976458 27804 80.91.229.3 (14 Mar 2016 17:27:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Mar 2016 17:27:38 +0000 (UTC) Cc: emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 14 18:27:29 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1afWH8-0007bG-FQ for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 18:27:26 +0100 Original-Received: from localhost ([::1]:42710 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afWH4-0005mr-Fv for ged-emacs-devel@m.gmane.org; Mon, 14 Mar 2016 13:27:22 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50534) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afWGm-0005lE-RN for emacs-devel@gnu.org; Mon, 14 Mar 2016 13:27:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afWGi-0005jO-Nz for emacs-devel@gnu.org; Mon, 14 Mar 2016 13:27:04 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:16809) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afWGi-0005iH-Ea for emacs-devel@gnu.org; Mon, 14 Mar 2016 13:27:00 -0400 Original-Received: (qmail 68409 invoked by uid 3782); 14 Mar 2016 17:26:59 -0000 Original-Received: from acm.muc.de (p579E8DE5.dip0.t-ipconnect.de [87.158.141.229]) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 14 Mar 2016 18:26:57 +0100 Original-Received: (qmail 5107 invoked by uid 1000); 14 Mar 2016 17:29:40 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:201720 Archived-At: Hello, Dmitry. On Mon, Mar 14, 2016 at 06:15:13PM +0200, Dmitry Gutov wrote: > On 03/14/2016 02:23 PM, Alan Mackenzie wrote: > >> Any performance numbers on using this approach in C/l mode? > > [ By "this approach" I'm taking it you mean scanning forward from safe > > positions.] > No, using syntax-ppss's cache. Including the syntax-ppss-last variable. CC Mode doesn't use syntax-ppss. It would be too much work to put it in, particularly as that function's future is unclear. > IIRC, Stefan posted a patch, which was like 10 lines long. Sorry, patch for what? I've lost the context. > Could you please try it already, so we can move on to discussing the > actual performance problems of syntax-ppss, instead of theoretical ones? It would be a lot of work. Perhaps you might like to undertake it. > > It's two slow to use as a replacement for back_comment, in the sense I > > would like to do - in place of scanning a comment backward character by > > character, I want to use a cache (calculated by forward scanning) to > > determine the beginning of a comment. Having to scan forward lots of > > characters (as opposed to a few) is out of the question for this > > application. > The approach X is out of the question, because my approach Y is usually > N times faster! > Why is it out of the question? Can I say "premature optimization"? You could, but it would sound a bit silly. back_comment currently works by scanning backward a character at a time. Each character will take about the same time to scan (give or take a factor of, perhaps, 2) as a character being scanned forward in parse-partial-sexp. Compare scanning backwards over a 100 character comment using the current back_comment with scanning forwards up to 20,000 characters to get a parse state on the comment's end position. CC Mode does a fair bit of scanning backwards of comments sequentially. Even syntax-last-pos (or whatever it's called) won't help much here. But if you want to do the experiment, be my guest. > >> So I have to wonder why the "get out of a comment" feature is used in > >> C/l mode so much that it becomes a bottleneck, and you get significant > >> improvement in performance by dropping the caching logic to C. That is, > >> of course, not a nice thing to ask considering the overall complexity of > >> CC Mode, but still. > >> I don't see anything comparable to 10 second waiting described in > >> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22884, when doing a > >> comparable operation in a 5000-line Ruby file. > > Here's a quick summary of how that happened: on L1661 of config.h, C > > Mode had to scan back to the beginning of a statement. That involved > > going back over ~1600 lines of comments and preprocessor constructs. > I went back to the revision a589e9a and rebuilt Emacs. I see the problem > described in 22884. However, the line 1661 already is a beginning of a > statement. It contains: > #define _DARWIN_USE_64_BIT_INODE 1 > Are we talking about the same file? Possibly. config.h is different between different runs of ./configure. In the file Paul was complaining about, there was a comment opener at the start of the line. I think it was actually the line you've cited, but commented out. > > When it got to the critical comment and tried to go back over it, > > (forward-comment -1) said nil, because of that paren in column 0. That > > paren in column 0, although in a comment, was deemed to be the start of > > a defun. C Mode was then trying to parse "code" over a region with a > > 1600 line gap in the middle. > > Hence the 10 second delay in seeing the > > character echoed. Paul has purged our code of parens in column 0. But > > it would be nice not to have the restriction. > (parse-partial-sexp 1 (point)) on line 1661 takes ~0.002s here. > (parse-partial-sexp 1 (point-max)) takes just a little above that. > How do these timings translate into whole seconds of waiting after > pressing '/'? They don't. It was CC Mode's indentation engine's scanning, not the raw parse-partial-sexp scanning. > > My point was that it is so simple that it _could_ be written in C, and > > that without any great difficulties. > "It was so simple that I made it more complex"? I mentioned C as a > drawback, and it is. Lots of Emacs is written in C. In this particular case, it is a good idea to avoid the possible complications and pitfalls of calling lisp from C when it is not necessary, particularly given the lack of any involved code which would make lisp the preferred language. > > Parts of it can only be written in > > C (the bits that ensure the cache is marked stale when certain > > sytax-table text properties are set/cleared when > > `inhibit-modification-hooks' is bound to non-nil). > These would have to be carefully considered, but if they make sense, > they would have to be ported to syntax-ppss too somehow. That wouldn't be a bad idea, once the future of that function becomes clear. -- Alan Mackenzie (Nuremberg, Germany).