From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Stefan Monnier <monnier@iro.umontreal.ca>
Newsgroups: gmane.emacs.devel
Subject: Re: Bug #25608 and the comment-cache branch
Date: Tue, 21 Feb 2017 22:53:06 -0500
Message-ID: <jwvr32qj3xw.fsf-monnier+emacs@gnu.org>
References: <20170202202418.GA2505@acm> <83lgtouxpf.fsf@gnu.org>
	<20170202215154.GB2505@acm> <83h94bvhzw.fsf@gnu.org>
	<20170203172952.GC2250@acm>
	<0a40d539-b7bc-2655-5429-6280022106ee@yandex.ru>
	<20170204102410.GA2047@acm>
	<8f9e68fc-4314-625d-b4bf-796c71c91798@yandex.ru>
	<20170206192423.GB3568@acm>
	<4f0fabf3-be9c-7492-379b-59dc93e72b4f@yandex.ru>
	<20170207192119.GA2490@acm>
	<424e6409-029c-d15d-421c-4fb90594329c@yandex.ru>
	<jwv4lzwn46i.fsf-monnier+emacs@gnu.org>
	<195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: blaine.gmane.org 1487736540 8136 195.159.176.226 (22 Feb 2017 04:09:00 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 22 Feb 2017 04:09:00 +0000 (UTC)
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)
Cc: emacs-devel@gnu.org
To: Dmitry Gutov <dgutov@yandex.ru>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 22 05:08:56 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cgOEY-0001bj-CD
	for ged-emacs-devel@m.gmane.org; Wed, 22 Feb 2017 05:08:54 +0100
Original-Received: from localhost ([::1]:49770 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cgOEe-00037n-90
	for ged-emacs-devel@m.gmane.org; Tue, 21 Feb 2017 23:09:00 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44190)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <monnier@iro.umontreal.ca>) id 1cgNzO-0005qe-VU
	for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:16 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <monnier@iro.umontreal.ca>) id 1cgNzL-0006KH-1a
	for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:15 -0500
Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:2585)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <monnier@iro.umontreal.ca>)
	id 1cgNzK-0006JJ-Q0
	for emacs-devel@gnu.org; Tue, 21 Feb 2017 22:53:10 -0500
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BcCQBECq1Y/5iQSC1eGgEBAQECAQEBAQgBAQEBg1FBhDaFVoVykSQpAZcXhhwEAgKCcEQUAQIBAQEBAQEBYiiEcQEEAVYjBQsLDiYSFBgNJC6JSwixUos+AQEBAQYCASWLO4o5BZBHhRuGKZxUhlqTJDYhgQAgFAgshyQiim8BAQE
X-IPAS-Result: A0BcCQBECq1Y/5iQSC1eGgEBAQECAQEBAQgBAQEBg1FBhDaFVoVykSQpAZcXhhwEAgKCcEQUAQIBAQEBAQEBYiiEcQEEAVYjBQsLDiYSFBgNJC6JSwixUos+AQEBAQYCASWLO4o5BZBHhRuGKZxUhlqTJDYhgQAgFAgshyQiim8BAQE
X-IronPort-AV: E=Sophos;i="5.35,192,1484024400"; d="scan'208";a="293484130"
Original-Received: from 45-72-144-152.cpe.teksavvy.com (HELO pastel.home)
	([45.72.144.152])
	by smtp.teksavvy.com with ESMTP; 21 Feb 2017 22:53:06 -0500
Original-Received: by pastel.home (Postfix, from userid 20848)
	id B4A8D6595F; Tue, 21 Feb 2017 22:53:06 -0500 (EST)
In-Reply-To: <195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru> (Dmitry Gutov's
	message of "Wed, 22 Feb 2017 04:25:53 +0200")
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 206.248.154.181
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:212536
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/212536>

> I see, thanks. And I think that means that, ideally, it would work without
> the caller having to adjust the syntax visibility bounds, or the like, as
> long as the syntax table is correct and the beginning (or the end) of the
> currently navigated comment is within view.

Right, but not reliably so: very often we need to parse backward not
just until the matching starter but until the previous closer (to make
sure the starter we saw was not itself within an earlier comment), and
in other cases the mix of comment markers and string markers make it
impossible to guess if we were really inside a comment, so we end up
falling back on the forward-parse code.

>> In the case we do scan forward (e.g. the case where we end up using
>> parse-partial-sexp (or syntax-ppss in my patch)), we actually manually
>> re-introduce that behavior: if the forward parse says that the
>> end-comment-marker in inside a string (or inside another comment), we
>> re-parse from the beginning of that string (or comment) to try and see
>> if that end-comment-marker could be considered to close a comment nested
>> within the string (or the other comment).
> That indeed sounds complex.

Actually, it's very straightforward: the forward parse already gives us
the beginning of the surrounding element, so we just re-do the forward
parse from that spot.  It's just a matter of wrapping the code inside
a loop.

>> Calling syntax-ppss every time back_comment is invoked would probably
>> result in bad performance currently: when parsing backward
>> (e.g. backward-sexp), the syntax-ppss-last optimization is ineffective,
>> so we'd fallback on syntax-ppss-cache which ends up scanning on the
>> average syntax-ppss-max-span/2 (i.e. 10K) chars.  When \n is a comment
>> ender (i.e. in most programming language modes), it would imply
>> a forward scan of 10K for every line.

> You're probably right, but I wonder what the benchmarks would say.

> (parse-partial-sexp 1 10000) takes 0.0005 seconds here, so it'd still
> require some intensive usage to show up on user's radar.

> Previously, we started from the beginning of the current defun, as
> delineated by an open paren in the first column, right?

No.  "Previously", we typically scan the line backward and stop as soon
as we hit the previous \n (which tells us that no comment can start
earlier than that if it finishes with a \n).

In a few cases, we do fallback on the forward parse code, in which case
indeed we'll take longer, but those are normally rare (which is why this
comment-cache and my syntax-ppss-patch haven't been installed yet: they
improve the performance of a case that's somewhat infrequent).

> Perhaps we could use the "generic comment bounds" syntax-table property to
> delineate such difficult comments. If that idea sounds similar to
> comment-cache, that is no accident.

Maybe.  Obviously, my syntax-ppss hammer makes me think that such
alternate solutions aren't needed: syntax-ppss solves this case without
having to try and come out with a clever way to detect which comments
are tricky nor how to mark them.

> I've only recently come to the realization that our usage of the
> syntax-table text property has the same general incompatibility with mixed
> mode buffers as comment-cache does. The only reasons why it doesn't show as
> much is because we use them relatively rarely. But we couldn't, for
> instance, apply a "generic string" syntax to some literal in a subregion
> that is inside a "generic string" belonging to the primary major mode.

Indeed.

> Not sure what to do about that.

Not completely sure either.  I've had vague ideas of adding some kind of
hook to syntax-tables, i.e. add a new kind of syntax element which ends
up calling an Elisp function of your choice so you can make it "do the
right thing" for the particular construct.

So when scanning (forward or backward), if we bump into an element with
that syntax (typically applied as a syntax-table text-property), we call
the function which will know how to jump over the sub-region or will
signal an "end of sub-region" error.


        Stefan