From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Pogonyshev Newsgroups: gmane.emacs.devel Subject: Re: generic buffer parsing cache data Date: Sun, 1 Jul 2007 16:41:58 +0300 Message-ID: <200707011641.58414.pogonyshev@gmx.net> References: <200707010038.23072.pogonyshev@gmx.net> <200707011516.31959.pogonyshev@gmx.net> <4687A040.8030808@gmx.at> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1183298805 5220 80.91.229.12 (1 Jul 2007 14:06:45 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 1 Jul 2007 14:06:45 +0000 (UTC) Cc: martin rudalics To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Jul 01 16:06:43 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1I504Y-0000XB-NQ for ged-emacs-devel@m.gmane.org; Sun, 01 Jul 2007 16:06:43 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1I504Y-0006he-BJ for ged-emacs-devel@m.gmane.org; Sun, 01 Jul 2007 10:06:42 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1I504U-0006hO-Rv for emacs-devel@gnu.org; Sun, 01 Jul 2007 10:06:38 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1I504U-0006h2-5j for emacs-devel@gnu.org; Sun, 01 Jul 2007 10:06:38 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1I504U-0006gx-17 for emacs-devel@gnu.org; Sun, 01 Jul 2007 10:06:38 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1I504T-0000y9-G9 for emacs-devel@gnu.org; Sun, 01 Jul 2007 10:06:37 -0400 Original-Received: from mail.gmx.net ([213.165.64.20]) by mx20.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1I4zU5-0002mB-ND for emacs-devel@gnu.org; Sun, 01 Jul 2007 09:29:02 -0400 Original-Received: (qmail invoked by alias); 01 Jul 2007 13:27:59 -0000 Original-Received: from unknown (EHLO [80.94.234.57]) [80.94.234.57] by mail.gmx.net (mp031) with SMTP; 01 Jul 2007 15:27:59 +0200 X-Authenticated: #16844820 X-Provags-ID: V01U2FsdGVkX18LD8A5inALk/4MqWC9F2XN+acN7TpuPlZi/vFznN 1pl65JVjFHocvz User-Agent: KMail/1.7.2 In-Reply-To: <4687A040.8030808@gmx.at> Content-Disposition: inline X-Y-GMX-Trusted: 0 X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-detected-kernel: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:74109 Archived-At: martin rudalics wrote: > > I propose to add something generic. For instance, Python mode needs to > > know indentation level of blocks. It seems that `syntax-ppss` doesn't > > return it at all. And adding everything that might ever be needed by > > some XYZ mode seems counter-productive and complicates an already complex > > function and its return value. > > > > I just mean that major modes can have needs beyond that suited by > > `syntax-ppss`. And as far as I can see, they can either parse half of > > the buffer each time they need something, or invent some ad-hoc custom > > code for caching such data. > > Like `c-state-cache'. Well, `syntax-ppss' can only do whatever > `parse-partial-sexp' does. Occasionally, that's not even sufficient for > the Elisp case (look how `lisp-font-lock-syntactic-face-function' > strives for detecting doc-strings). I'd appreciate if you came up with > something more "generic" (if you just could give a clear description of > that term). For instance, something like this: Function: put-cache-data key data &optional pos Store cache DATA with given KEY in the current buffer, at position POS (if not specified, then where point currently is.) Function: get-cache-data key &optional pos Return cache data associated with given KEY in the current buffer at position POS (if not specified, then where point currently is.) If there is no data with that KEY stored at position, or if it has been invalidated, return nil. Internally, Emacs core (at C level) automatically invalidates cache data starting from X onwards when buffer text from X to Y (Y >= X) changes in some way. Whether cache data is actively removed from internal storage, or just somehow marked invalid is implementation detail and irrelevant for Elisp level. It is unclear whether changes in any text properties should lead to cache invalidation. Probably no, at least by default. It also makes sense to define some `anchors'. Those would be ways of partitioning buffers into parts, where changes in one part don't cause invalidation of cache data in other parts. For instance, in Python mode anchors would be set wherever a toplevel block is defined, since it stops parsing on reaching a toplevel anyway. However, this can be added later. For instance, it is not clear when and how to remove anchors. (I.e. in Python mode if toplevel is indented to another level, it should stop being an anchor.) It is required that major mode stores cache data at some logical position, so it can later find them again. Maybe it also makes sense to add Function: find-cache-data key &optional pos Find and return cache data at POS (or point position) or _before it_. Return nil if there is no (valid) cached data at pos or anywhere before with that KEY. However, I don't see any obvious ways of using it. As I can see, modes should access cache data like this (in pseudocode): mode-get-cache-data: data = (get-cache-data mode-key) if data is nil: data = (mode-compute-cache-data) (put-cache-data mode-key data) return data mode-compute-cache-data: save-excursion: travel-to-higher-level-cache-point higher-level-data = (mode-get-cache-data) data = (mode-compute-data-from-higher-level higher-level-data) return data Here `higher-level' is not the same as `previous'. For instance, in Python mode it makes sense to compute indentation from the block this one is nested in, not just previous block: class X: class Y: # <-- higher-level block for the current block class Z: def bla (): # <-- previos block (with cached data) pass def __init__(self): # <-- current block pass Paul