From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: syntax-propertize-function vs indentation lexer Date: Fri, 31 May 2013 03:45:07 -0400 Message-ID: <85k3mf8uf0.fsf@member.fsf.org> References: <85mwrdbypv.fsf@member.fsf.org> <85bo7sbzhh.fsf@member.fsf.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1369986327 27989 80.91.229.3 (31 May 2013 07:45:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 31 May 2013 07:45:27 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 31 09:45:27 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UiK1e-0002tj-IY for ged-emacs-devel@m.gmane.org; Fri, 31 May 2013 09:45:26 +0200 Original-Received: from localhost ([::1]:35712 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UiK1Z-00009l-8U for ged-emacs-devel@m.gmane.org; Fri, 31 May 2013 03:45:21 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56119) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UiK1V-00008w-Hn for emacs-devel@gnu.org; Fri, 31 May 2013 03:45:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UiK1S-0001qZ-Bg for emacs-devel@gnu.org; Fri, 31 May 2013 03:45:17 -0400 Original-Received: from vms173007pub.verizon.net ([206.46.173.7]:40307) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UiK1S-0001qP-4G for emacs-devel@gnu.org; Fri, 31 May 2013 03:45:14 -0400 Original-Received: from TAKVER ([unknown] [71.241.247.125]) by vms173007.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0MNN00FLPK7CW190@vms173007.mailsrvcs.net> for emacs-devel@gnu.org; Fri, 31 May 2013 02:45:12 -0500 (CDT) In-reply-to: (Stefan Monnier's message of "Thu, 30 May 2013 10:02:59 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.46.173.7 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:159947 Archived-At: Stefan Monnier writes: >> The doc string for syntax-propertize-function only mentions >> font-lock, not indentation etc; it should say "most syntax uses", or >> better, list all the places it is called. > > Oops, indeed it singles out font-lock. I just installed the patch below > which should address this problem. Looks good, thanks. >> The later; I'm parsing the entire buffer with an LALR parser in >> ada-mode, and whenever it changes, > > Sounds expensive. How does it cope with large buffers? Not clear yet - I'm still getting the Ada grammar right. The parser is actually generalized LALR, which spawns parallel parsers for grammar conflicts and ambiguities. So it can be very slow when the grammar has too many conflicts or is ambiguous - running 64 parsers in parallel is a lot slower than running 1 :). But it works well when the conflict can be resolved in a few tokens, and is much easier than reconstructing the grammar to eliminate the conflict. >> and caching the results for use by indent. So far it's quite fast. > > How much time does it take to open a 1MB file? I've never seen a 1MB Ada source code file. Such a file would never be accepted in any project I have worked on, in any source language, unless it was generated from some other source. In which case it should not be edited by hand, and should only be read rarely. So I don't think that's a realistic use case, and there is a reasonable limit to file size. Of course, it should be possible to open such a file in any case, so perhaps I'll need an explicit limit to disable parsing on large files. But any discussion of parser speed is premature at this point. >> So I need to call >> (syntax-propertize (point-max)) >> in ada-mode > > I wouldn't put it in ada-mode, no. Instead, I'd put it closer to the > code that actually needs those properties to be applied. E.g. I'd > either put it in the LALR parser code (if that code needs the syntax > properties) or in the indentation code. There may be other code, completely independent of the parser, that relies on syntax; imenu, for example. I'm also using the cached data for navigation (moving from 'if' to 'then' to 'elsif' to 'end if' etc); that is logically independent of indentation (but not of the parser, of course). > Note that calling > syntax-propertize repeatedly is cheap: if the region has already been > handled, it returns almost instantly since it begins with > > (when (and syntax-propertize-function > (< syntax-propertize--done pos)) yes, that does help. > Also I probably wouldn't put (syntax-propertize (point-max)), but > instead use (syntax-propertize end) where `end' is the end of the region > being currently LALR-parsed or being considered by the indentation > code. I considered that. Since the parser is asynchronous from the indentation, it would have to go in the parser (actually lexer) code. wisi-forward-token would be a logical place. But what would be the right guess for 'end'? The first step in wisi-forward-token is forward-comment, which can skip quite large portions of the buffer. LALR always parses an entire top-level grammar structure. For Ada files, that is the whole file, for all the coding standards I'm aware of. The language itself allows for more than one per file, but doing that messes up dependency analysis, and prevents minimal recompilation. So the only reasonable guess for 'end', for Ada, is point-max. There may be other reasonable guesses for other languages, so a language-specific hook might be a good choice. C++, for example, often has multiple classes per file; each class would be a top-level grammar structure. But scanning for the end of the class logically requires running syntax-propertize (maybe not actually for C++, but some language might require that), so we've got a real problem. (C++ does require running the macro preprocessor before any scanning, which is very expensive). I'm not worrying about that right now, but (point-max) is a cheap and always correct answer. >> (syntax-ppss-flush-cache begin) >> (syntax-propertize end) >> in the after-change hook. > > You might want to put the syntax-ppss-flush-cache there (although > syntax.el should already take care of that, normally), How does syntax.el take care of this? The only function on after-change-functions by default is jit-lock-after-change. And that's only there if font-lock is on. I have been implicitly assuming syntax-ppss is correct after a text change, but I never investigated how that worked. > but the > syntax-propertize doesn't belong there either (since it belong to the > code that actually uses those properties, i.e. either the parser or the > indentation). Syntax properties are closely tied to the text (they are an extension of the syntax table), and used by several independent functions, and thus should be kept consistent with the text as much as possible. So syntax-propertize should be run whenever the text changes. The same could be said for the cached parse results; the parser should also be run from after-change-functions. I'm not going that far (yet) because I'm still debugging the parser, and don't want it called automatically too early. But that may be the right move eventually, to support imenu etc. Another design choice would be to have all the low-level functions that rely on syntax (forward-comment, forward-word, etc) call syntax-propertize. That would certainly be more transparent, and is consistent with what you are advocating. But that runs into the 'reasonable guess for end' problem; I think the language mode is the best place to resolve that problem. A language hook to provide the guess would be reasonable, but that hook could be expensive (since it reduces parser time, which is even more expensive), and thus should not be called more often than necessary (certainly not for every call of forward-comment). I think you are actually advocating for a third choice; any code that depends on low-level syntax functions must be aware of syntax-propertize, and call it appropriately. That makes sense. It would help if the doc string for parse-partial-sexp mentioned syntax-propertize and syntax-ppss-flush-cache; then I would have been aware of this issue sooner. -- -- Stephe