From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Dmitry Gutov <dgutov@yandex.ru>
Newsgroups: gmane.emacs.devel
Subject: Re: /srv/bzr/emacs/trunk r101338: * lisp/emacs-lisp/syntax.el
	(syntax-ppss): More sanity check to catch
Date: Wed, 12 Feb 2014 04:49:15 +0200
Message-ID: <52FAE12B.6060101@yandex.ru>
References: <E1Os1zA-0006uO-PC@internal.in.savannah.gnu.org>	<87r47bi1e5.fsf@yandex.ru>
	<jwv38jr54ys.fsf-monnier+emacs@gnu.org>	<52F96284.50507@yandex.ru>
	<jwvbnydw35g.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1392173377 27132 80.91.229.3 (12 Feb 2014 02:49:37 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Wed, 12 Feb 2014 02:49:37 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 12 03:49:44 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WDPtP-0008ME-8v
	for ged-emacs-devel@m.gmane.org; Wed, 12 Feb 2014 03:49:43 +0100
Original-Received: from localhost ([::1]:37047 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WDPtO-0008FB-VA
	for ged-emacs-devel@m.gmane.org; Tue, 11 Feb 2014 21:49:42 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40895)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raaahh@gmail.com>) id 1WDPtE-0008CN-3B
	for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:40 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <raaahh@gmail.com>) id 1WDPt3-0003iW-2J
	for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:31 -0500
Original-Received: from mail-ee0-x234.google.com ([2a00:1450:4013:c00::234]:54468)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raaahh@gmail.com>) id 1WDPt2-0003iL-Iu
	for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:20 -0500
Original-Received: by mail-ee0-f52.google.com with SMTP id e53so4007297eek.11
	for <emacs-devel@gnu.org>; Tue, 11 Feb 2014 18:49:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=os1opCnPXxS2bExF6r43g/9pcsStXINitcYyy1KycdI=;
	b=FLk5NKPGExcT25Uiqu7WvUJlIOiqzsyne97bkfnGRQsy23+ziWRjCl6R5kGdG8eFp0
	v4+7YPYgf29BldaseuuGlnxgF6kvB3MXfysNBsH3+wwaZkg1oDc4GEHpC+k0qiwwk3xy
	VUkEeb69anUxMuvG9cZg3zHZsWv5Lo8lnqbl/V6c3Zx6KmyD95J1ZFqBjvDAHExyQWO1
	Mi7qsicr2Z0aNKDvhRiQbhW5OwTMbe6XSiy8lX9avhgNBpyenr5LSlN1rULTr5MwJdU+
	kNqC745uhWXc/KJ5xRRPgLafJnJBEqzml6nadnMEqy1SavWfsZ5Y5hrwDyXF6yijyXal
	EwRg==
X-Received: by 10.14.94.3 with SMTP id m3mr465893eef.54.1392173359253;
	Tue, 11 Feb 2014 18:49:19 -0800 (PST)
Original-Received: from [192.168.10.2] (62-36-157.netrun.cytanet.com.cy.
	[62.228.36.157])
	by mx.google.com with ESMTPSA id s46sm74396048eeb.0.2014.02.11.18.49.17
	for <multiple recipients>
	(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
	Tue, 11 Feb 2014 18:49:18 -0800 (PST)
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:24.0) Gecko/20100101 Thunderbird/24.2.0
In-Reply-To: <jwvbnydw35g.fsf-monnier+emacs@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:4013:c00::234
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:169543
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/169543>

On 12.02.2014 03:30, Stefan Monnier wrote:
> E.g. I had some font-lock code which would highlight an
> open-paren-in-column-0-in-string/comment with the `warning' face.
> So such an "incorrect" open paren would still cause incorrect
> highlighting, but the `warning' face on it would provide the clue as to
> what was the source of the problem.

I don't fully understand the explanation, but the logic "if 
syntax-beginning equals point, go to previous syntax-beginning" could've 
been handled in the specific syntax-beginning-function instead.

> Right, but that largely defeats the purpose of syntax-ppss (which is to
> use caching to speed up (parse-partial-sexp (point-min) (point))).

The optimization is still used if `syntax-ppss' is called several times 
during the syntax-propertization or fontification of one region.

Same with indentation, if we did that.

> To give you some background: I think syntax-begin-function is basically
> useless.  It's used in very few places (it used to be used in lisp-mode,
> but that was disabled recently, it's still used in js-mode, but it
> should probably be disabled there as well, and apparently mmm-mode also
> uses it, but these are the only cases I know) and is more trouble than
> it's worth.  It was meant and is designed as an optimization, but it is
> vanishingly often useful.

Okay, I can understand that.

> One option is to have a hook that takes a (POS . PPSS) pair, which
> syntax-ppss intends to use as a starting point for parsing, and return
> a new such pair to use instead, where the returned position should
> always be >= POS.

Sounds fine to me. As long as the hook is called at the same point 
`syntax-ppss' is called at, we can check whether POS is in the same 
region, look for nested submode regions between POS and point, and 
either discard the passed PPSS if the current subregion begins after 
POS, or manually `parse-partial-sexp' each piece of the current 
subregion (of the primary mode region, if we're there) between POS and 
some position closer to point.

We could parse the buffer till point itself, though. It wouldn't be 
harder coding-wise (we'll do `parse-partial-sexp's anyway), and that way 
the hook could be more flexible. Then the meaning of the hook would be 
"here's the last saved position and value, what will be the value at 
point?".

> This way, syntax-ppss could make full use of its cache, but mmm-mode
> could tell it about chunk boundaries (and decide what state to use at
> the beginning of a boundary).
>
> The main problem I see with this approach is that this hook would be
> called maybe too many times, so we'd want to improve the "fast path"
> (i.e. the first branch in syntax-ppss which tries to use
> syntax-ppss-last) so it can know when calling this new hook is unneeded.

Maybe we want that, but scanning the buffer for overlays should still be 
a) proportional to the distance between bounds, b) faster than 
`parse-partial-sexp', so at worst in mmm-mode the new scheme will just 
be slower than plain `syntax-ppss' by some constant ratio, on average.

> Maybe for that, the new hook should return not just a new (POS . PPSS)
> but also a "next-boundary" so we know we don't need to call this hook
> again as long as we're within POS...NEXT-BOUNDARY.

Not sure if it'll work. Suppose we're in some region, which spans 400 
chars after point, and then it's another region.

We call `syntax-ppss', happily report to it that the value at point (or 
some position near it) can be used until point + 400. Then move a few 
chars lower and delete the rest of the given region. NEXT-BOUNDARY 
becomes stale, and calling `syntax-ppss' from the region below can 
return a wrong value.

Using markers should work better, but maybe some problems are lurking 
there as well.