From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Dmitry Gutov <dgutov@yandex.ru>
Newsgroups: gmane.emacs.devel
Subject: Re: Bug #25608 and the comment-cache branch
Date: Thu, 23 Feb 2017 16:23:39 +0200
Message-ID: <9f4ac18b-6453-bee8-83d8-f452392adbb9@yandex.ru>
References: <20170202202418.GA2505@acm> <83lgtouxpf.fsf@gnu.org>
	<20170202215154.GB2505@acm> <83h94bvhzw.fsf@gnu.org>
	<20170203172952.GC2250@acm>
	<0a40d539-b7bc-2655-5429-6280022106ee@yandex.ru>
	<20170204102410.GA2047@acm>
	<8f9e68fc-4314-625d-b4bf-796c71c91798@yandex.ru>
	<20170206192423.GB3568@acm>
	<4f0fabf3-be9c-7492-379b-59dc93e72b4f@yandex.ru>
	<20170207192119.GA2490@acm>
	<424e6409-029c-d15d-421c-4fb90594329c@yandex.ru>
	<jwv4lzwn46i.fsf-monnier+emacs@gnu.org>
	<195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru>
	<jwvr32qj3xw.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: blaine.gmane.org 1487859851 10889 195.159.176.226 (23 Feb 2017 14:24:11 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Thu, 23 Feb 2017 14:24:11 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.0
Cc: emacs-devel@gnu.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 23 15:24:00 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cguJK-0001mV-Fr
	for ged-emacs-devel@m.gmane.org; Thu, 23 Feb 2017 15:23:58 +0100
Original-Received: from localhost ([::1]:59014 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cguJQ-0001ZT-E1
	for ged-emacs-devel@m.gmane.org; Thu, 23 Feb 2017 09:24:04 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33481)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raaahh@gmail.com>) id 1cguJE-0001YL-8D
	for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:53 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <raaahh@gmail.com>) id 1cguJ7-0005lj-Vj
	for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:52 -0500
Original-Received: from mail-wr0-x242.google.com ([2a00:1450:400c:c0c::242]:34054)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <raaahh@gmail.com>) id 1cguJ7-0005lF-M9
	for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:45 -0500
Original-Received: by mail-wr0-x242.google.com with SMTP id 89so4094297wrr.1
	for <emacs-devel@gnu.org>; Thu, 23 Feb 2017 06:23:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:subject:to:cc:references:from:message-id:date:user-agent
	:mime-version:in-reply-to:content-language:content-transfer-encoding;
	bh=EyZ8g75MkOi00dBSXGyn37F+sVXe9fcYjHgFASoMyiU=;
	b=th1uUlgmcstuN89kW85rX6OjXKPCdEZPGacBkyFgSAKz/qiogHT1QNN74FNH+fUcSC
	/9rAesq6OeqRIcl0YLfIg9WcpvnWc2ihKzdvifoG4Tid5VwKlSLeTsMsgqxaHZ9n1w6N
	5VICViLAx3w68uSYUe5RJa7J5rKCmmdu1y9RtjdYDojce9UgujmEmkXbVsHx19Arcn+B
	0seZlPz/AWjE7r4cEGeDaZK9JrG62m0sm2fsoqPW0TFij3dSs3vllefulpPZFvsW6+0s
	T1JgZihs5MyIJQbNuHkULa8iAY6SYgV4DIn3exUKnrV4eeYsBN50CgPTZKXEJCcg0ua0
	vuKw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:subject:to:cc:references:from:message-id
	:date:user-agent:mime-version:in-reply-to:content-language
	:content-transfer-encoding;
	bh=EyZ8g75MkOi00dBSXGyn37F+sVXe9fcYjHgFASoMyiU=;
	b=mdEPRhsw4nz/xsiOlCFuvKcvwoOLtbM/JDbDbIo5nd14f8uEMoIkOtNUi9ErGAxho6
	fam1X9FIRMF3lS0Kolwnj1jS0Wlnq0bujfkpeq3yRwK5aAyBLqWTKGcCbgGTDi2rlCzD
	M2WNWtDoLwbryqj4AJOqGrLoMTcfaOomEn/0KR1WG8k1zuzTOVNJn/sQfxNNr18uJnOC
	pywsak74CNW89pY+LY9uYnWxvBahWdV+kz6ib+mWlUBJ6MoU2yxevXLjuL7fRu+Brzdl
	iAF2PBQopqZMLJCnAUN6o3MvbChvYLbNGbHAuZI5eOABu1/eBd/cyRRaYrjLF+W9K1Ir
	yhRw==
X-Gm-Message-State: AMke39l9HrJK3O9nu24/0dEEKTkrOiaYAqona32anVBm7buBhp51G2W4j4AggvwGI8sICw==
X-Received: by 10.223.141.229 with SMTP id o92mr31264568wrb.22.1487859822130; 
	Thu, 23 Feb 2017 06:23:42 -0800 (PST)
Original-Received: from [192.168.0.133] ([212.50.99.193])
	by smtp.googlemail.com with ESMTPSA id
	b8sm3569310wrb.9.2017.02.23.06.23.40
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 23 Feb 2017 06:23:41 -0800 (PST)
In-Reply-To: <jwvr32qj3xw.fsf-monnier+emacs@gnu.org>
Content-Language: en-US
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2a00:1450:400c:c0c::242
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:212544
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/212544>

On 22.02.2017 05:53, Stefan Monnier wrote:
>> I see, thanks. And I think that means that, ideally, it would work without
>> the caller having to adjust the syntax visibility bounds, or the like, as
>> long as the syntax table is correct and the beginning (or the end) of the
>> currently navigated comment is within view.
> 
> Right, but not reliably so: very often we need to parse backward not
> just until the matching starter but until the previous closer (to make
> sure the starter we saw was not itself within an earlier comment), and
> in other cases the mix of comment markers and string markers make it
> impossible to guess if we were really inside a comment, so we end up
> falling back on the forward-parse code.

Naturally, we'd need to save more information to be able to do that. 
E.g. propertize the end of a complex comment with the position of its 
beginning. Since the first time we go through a buffer is in the forward 
direction, getting that info would be inexpensive.

> Actually, it's very straightforward: the forward parse already gives us
> the beginning of the surrounding element, so we just re-do the forward
> parse from that spot.  It's just a matter of wrapping the code inside
> a loop.

You're likely a better judge of that. It does sound a bit convoluted to 
me (and having to deal with different kinds of comments adds its 
complexity), but not something that having a handful of tests wouldn't 
keep straight.

> No.  "Previously", we typically scan the line backward and stop as soon
> as we hit the previous \n (which tells us that no comment can start
> earlier than that if it finishes with a \n).
> 
> In a few cases, we do fallback on the forward parse code, in which case
> indeed we'll take longer, but those are normally rare (which is why this
> comment-cache and my syntax-ppss-patch haven't been installed yet: they
> improve the performance of a case that's somewhat infrequent).

I see, thanks.

>> Perhaps we could use the "generic comment bounds" syntax-table property to
>> delineate such difficult comments. If that idea sounds similar to
>> comment-cache, that is no accident.
> 
> Maybe.  Obviously, my syntax-ppss hammer makes me think that such
> alternate solutions aren't needed: syntax-ppss solves this case without
> having to try and come out with a clever way to detect which comments
> are tricky nor how to mark them.

The alternative tweak I had in mind would be applied somewhere around 
syntax-propertize. So it would be a matter of trading off one bit of 
complexity for another, still staying within the framework of syntax-ppss.

> Not completely sure either.  I've had vague ideas of adding some kind of
> hook to syntax-tables, i.e. add a new kind of syntax element which ends
> up calling an Elisp function of your choice so you can make it "do the
> right thing" for the particular construct.

I think just having paired syntactic elements would suffice. Or just 
propertizing the whole subregion with one text property span. Whichever 
would be easier to process.

Not sure about using the syntax-table property for this. In some weird 
cases there won't be a space of a newline to put these syntax-table 
values on. And a newline staying a newline might be syntactically 
important for the primary major mode somewhere.

Another thing to consider is that we would probably want to fontify the 
contents of all subregions normally, even when inside comments belonging 
to the outer mode. So the primitives used in 
font-lock-fontify-syntactically-region would need to be able to stop at 
those boundaries instead of automatically skipping over.

> So when scanning (forward or backward), if we bump into an element with
> that syntax (typically applied as a syntax-table text-property), we call
> the function which will know how to jump over the sub-region or will
> signal an "end of sub-region" error.

Just having those hooks won't be enough, we still don't have enough info 
how to syntax-propertize the subregion contents, for instance. So I'm 
not sure what the flexibility of using the functions here would buy us.