From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Bug #25608 and the comment-cache branch Date: Tue, 21 Feb 2017 22:53:06 -0500 Message-ID: References: <20170202202418.GA2505@acm> <83lgtouxpf.fsf@gnu.org> <20170202215154.GB2505@acm> <83h94bvhzw.fsf@gnu.org> <20170203172952.GC2250@acm> <0a40d539-b7bc-2655-5429-6280022106ee@yandex.ru> <20170204102410.GA2047@acm> <8f9e68fc-4314-625d-b4bf-796c71c91798@yandex.ru> <20170206192423.GB3568@acm> <4f0fabf3-be9c-7492-379b-59dc93e72b4f@yandex.ru> <20170207192119.GA2490@acm> <424e6409-029c-d15d-421c-4fb90594329c@yandex.ru> <195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1487736540 8136 195.159.176.226 (22 Feb 2017 04:09:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 22 Feb 2017 04:09:00 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 22 05:08:56 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cgOEY-0001bj-CD for ged-emacs-devel@m.gmane.org; Wed, 22 Feb 2017 05:08:54 +0100 Original-Received: from localhost ([::1]:49770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgOEe-00037n-90 for ged-emacs-devel@m.gmane.org; Tue, 21 Feb 2017 23:09:00 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44190) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgNzO-0005qe-VU for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:16 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cgNzL-0006KH-1a for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:15 -0500 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:2585) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cgNzK-0006JJ-Q0 for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:10 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0BcCQBECq1Y/5iQSC1eGgEBAQECAQEBAQgBAQEBg1FBhDaFVoVykSQpAZcXhhwEAgKCcEQUAQIBAQEBAQEBYiiEcQEEAVYjBQsLDiYSFBgNJC6JSwixUos+AQEBAQYCASWLO4o5BZBHhRuGKZxUhlqTJDYhgQAgFAgshyQiim8BAQE X-IPAS-Result: A0BcCQBECq1Y/5iQSC1eGgEBAQECAQEBAQgBAQEBg1FBhDaFVoVykSQpAZcXhhwEAgKCcEQUAQIBAQEBAQEBYiiEcQEEAVYjBQsLDiYSFBgNJC6JSwixUos+AQEBAQYCASWLO4o5BZBHhRuGKZxUhlqTJDYhgQAgFAgshyQiim8BAQE X-IronPort-AV: E=Sophos;i="5.35,192,1484024400"; d="scan'208";a="293484130" Original-Received: from 45-72-144-152.cpe.teksavvy.com (HELO pastel.home) ([45.72.144.152]) by smtp.teksavvy.com with ESMTP; 21 Feb 2017 22:53:06 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id B4A8D6595F; Tue, 21 Feb 2017 22:53:06 -0500 (EST) In-Reply-To: <195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru> (Dmitry Gutov's message of "Wed, 22 Feb 2017 04:25:53 +0200") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.154.181 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:212536 Archived-At: > I see, thanks. And I think that means that, ideally, it would work without > the caller having to adjust the syntax visibility bounds, or the like, as > long as the syntax table is correct and the beginning (or the end) of the > currently navigated comment is within view. Right, but not reliably so: very often we need to parse backward not just until the matching starter but until the previous closer (to make sure the starter we saw was not itself within an earlier comment), and in other cases the mix of comment markers and string markers make it impossible to guess if we were really inside a comment, so we end up falling back on the forward-parse code. >> In the case we do scan forward (e.g. the case where we end up using >> parse-partial-sexp (or syntax-ppss in my patch)), we actually manually >> re-introduce that behavior: if the forward parse says that the >> end-comment-marker in inside a string (or inside another comment), we >> re-parse from the beginning of that string (or comment) to try and see >> if that end-comment-marker could be considered to close a comment nested >> within the string (or the other comment). > That indeed sounds complex. Actually, it's very straightforward: the forward parse already gives us the beginning of the surrounding element, so we just re-do the forward parse from that spot. It's just a matter of wrapping the code inside a loop. >> Calling syntax-ppss every time back_comment is invoked would probably >> result in bad performance currently: when parsing backward >> (e.g. backward-sexp), the syntax-ppss-last optimization is ineffective, >> so we'd fallback on syntax-ppss-cache which ends up scanning on the >> average syntax-ppss-max-span/2 (i.e. 10K) chars. When \n is a comment >> ender (i.e. in most programming language modes), it would imply >> a forward scan of 10K for every line. > You're probably right, but I wonder what the benchmarks would say. > (parse-partial-sexp 1 10000) takes 0.0005 seconds here, so it'd still > require some intensive usage to show up on user's radar. > Previously, we started from the beginning of the current defun, as > delineated by an open paren in the first column, right? No. "Previously", we typically scan the line backward and stop as soon as we hit the previous \n (which tells us that no comment can start earlier than that if it finishes with a \n). In a few cases, we do fallback on the forward parse code, in which case indeed we'll take longer, but those are normally rare (which is why this comment-cache and my syntax-ppss-patch haven't been installed yet: they improve the performance of a case that's somewhat infrequent). > Perhaps we could use the "generic comment bounds" syntax-table property to > delineate such difficult comments. If that idea sounds similar to > comment-cache, that is no accident. Maybe. Obviously, my syntax-ppss hammer makes me think that such alternate solutions aren't needed: syntax-ppss solves this case without having to try and come out with a clever way to detect which comments are tricky nor how to mark them. > I've only recently come to the realization that our usage of the > syntax-table text property has the same general incompatibility with mixed > mode buffers as comment-cache does. The only reasons why it doesn't show as > much is because we use them relatively rarely. But we couldn't, for > instance, apply a "generic string" syntax to some literal in a subregion > that is inside a "generic string" belonging to the primary major mode. Indeed. > Not sure what to do about that. Not completely sure either. I've had vague ideas of adding some kind of hook to syntax-tables, i.e. add a new kind of syntax element which ends up calling an Elisp function of your choice so you can make it "do the right thing" for the particular construct. So when scanning (forward or backward), if we bump into an element with that syntax (typically applied as a syntax-table text-property), we call the function which will know how to jump over the sub-region or will signal an "end of sub-region" error. Stefan