From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: /srv/bzr/emacs/trunk r101338: * lisp/emacs-lisp/syntax.el (syntax-ppss): More sanity check to catch Date: Wed, 12 Feb 2014 04:49:15 +0200 Message-ID: <52FAE12B.6060101@yandex.ru> References: <87r47bi1e5.fsf@yandex.ru> <52F96284.50507@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1392173377 27132 80.91.229.3 (12 Feb 2014 02:49:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 12 Feb 2014 02:49:37 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 12 03:49:44 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WDPtP-0008ME-8v for ged-emacs-devel@m.gmane.org; Wed, 12 Feb 2014 03:49:43 +0100 Original-Received: from localhost ([::1]:37047 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WDPtO-0008FB-VA for ged-emacs-devel@m.gmane.org; Tue, 11 Feb 2014 21:49:42 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40895) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WDPtE-0008CN-3B for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WDPt3-0003iW-2J for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:31 -0500 Original-Received: from mail-ee0-x234.google.com ([2a00:1450:4013:c00::234]:54468) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WDPt2-0003iL-Iu for emacs-devel@gnu.org; Tue, 11 Feb 2014 21:49:20 -0500 Original-Received: by mail-ee0-f52.google.com with SMTP id e53so4007297eek.11 for ; Tue, 11 Feb 2014 18:49:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=os1opCnPXxS2bExF6r43g/9pcsStXINitcYyy1KycdI=; b=FLk5NKPGExcT25Uiqu7WvUJlIOiqzsyne97bkfnGRQsy23+ziWRjCl6R5kGdG8eFp0 v4+7YPYgf29BldaseuuGlnxgF6kvB3MXfysNBsH3+wwaZkg1oDc4GEHpC+k0qiwwk3xy VUkEeb69anUxMuvG9cZg3zHZsWv5Lo8lnqbl/V6c3Zx6KmyD95J1ZFqBjvDAHExyQWO1 Mi7qsicr2Z0aNKDvhRiQbhW5OwTMbe6XSiy8lX9avhgNBpyenr5LSlN1rULTr5MwJdU+ kNqC745uhWXc/KJ5xRRPgLafJnJBEqzml6nadnMEqy1SavWfsZ5Y5hrwDyXF6yijyXal EwRg== X-Received: by 10.14.94.3 with SMTP id m3mr465893eef.54.1392173359253; Tue, 11 Feb 2014 18:49:19 -0800 (PST) Original-Received: from [192.168.10.2] (62-36-157.netrun.cytanet.com.cy. [62.228.36.157]) by mx.google.com with ESMTPSA id s46sm74396048eeb.0.2014.02.11.18.49.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Feb 2014 18:49:18 -0800 (PST) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4013:c00::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:169543 Archived-At: On 12.02.2014 03:30, Stefan Monnier wrote: > E.g. I had some font-lock code which would highlight an > open-paren-in-column-0-in-string/comment with the `warning' face. > So such an "incorrect" open paren would still cause incorrect > highlighting, but the `warning' face on it would provide the clue as to > what was the source of the problem. I don't fully understand the explanation, but the logic "if syntax-beginning equals point, go to previous syntax-beginning" could've been handled in the specific syntax-beginning-function instead. > Right, but that largely defeats the purpose of syntax-ppss (which is to > use caching to speed up (parse-partial-sexp (point-min) (point))). The optimization is still used if `syntax-ppss' is called several times during the syntax-propertization or fontification of one region. Same with indentation, if we did that. > To give you some background: I think syntax-begin-function is basically > useless. It's used in very few places (it used to be used in lisp-mode, > but that was disabled recently, it's still used in js-mode, but it > should probably be disabled there as well, and apparently mmm-mode also > uses it, but these are the only cases I know) and is more trouble than > it's worth. It was meant and is designed as an optimization, but it is > vanishingly often useful. Okay, I can understand that. > One option is to have a hook that takes a (POS . PPSS) pair, which > syntax-ppss intends to use as a starting point for parsing, and return > a new such pair to use instead, where the returned position should > always be >= POS. Sounds fine to me. As long as the hook is called at the same point `syntax-ppss' is called at, we can check whether POS is in the same region, look for nested submode regions between POS and point, and either discard the passed PPSS if the current subregion begins after POS, or manually `parse-partial-sexp' each piece of the current subregion (of the primary mode region, if we're there) between POS and some position closer to point. We could parse the buffer till point itself, though. It wouldn't be harder coding-wise (we'll do `parse-partial-sexp's anyway), and that way the hook could be more flexible. Then the meaning of the hook would be "here's the last saved position and value, what will be the value at point?". > This way, syntax-ppss could make full use of its cache, but mmm-mode > could tell it about chunk boundaries (and decide what state to use at > the beginning of a boundary). > > The main problem I see with this approach is that this hook would be > called maybe too many times, so we'd want to improve the "fast path" > (i.e. the first branch in syntax-ppss which tries to use > syntax-ppss-last) so it can know when calling this new hook is unneeded. Maybe we want that, but scanning the buffer for overlays should still be a) proportional to the distance between bounds, b) faster than `parse-partial-sexp', so at worst in mmm-mode the new scheme will just be slower than plain `syntax-ppss' by some constant ratio, on average. > Maybe for that, the new hook should return not just a new (POS . PPSS) > but also a "next-boundary" so we know we don't need to call this hook > again as long as we're within POS...NEXT-BOUNDARY. Not sure if it'll work. Suppose we're in some region, which spans 400 chars after point, and then it's another region. We call `syntax-ppss', happily report to it that the value at point (or some position near it) can be used until point + 400. Then move a few chars lower and delete the rest of the given region. NEXT-BOUNDARY becomes stale, and calling `syntax-ppss' from the region below can return a wrong value. Using markers should work better, but maybe some problems are lurking there as well.