From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Tokenizing Date: Sun, 21 Sep 2014 10:32:29 -0500 Message-ID: <85ha01dm5u.fsf@stephe-leake.org> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1411313584 9699 80.91.229.3 (21 Sep 2014 15:33:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 21 Sep 2014 15:33:04 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 21 17:32:57 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XVj8B-00083Y-Sq for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 17:32:56 +0200 Original-Received: from localhost ([::1]:39856 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVj8B-0006EN-AS for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 11:32:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56654) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVj80-0006DB-Uf for emacs-devel@gnu.org; Sun, 21 Sep 2014 11:32:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XVj7t-0006wS-F3 for emacs-devel@gnu.org; Sun, 21 Sep 2014 11:32:44 -0400 Original-Received: from dnvrco-outbound-snat.email.rr.com ([107.14.73.229]:18804 helo=dnvrco-oedge-vip.email.rr.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVj7t-0006ve-AN for emacs-devel@gnu.org; Sun, 21 Sep 2014 11:32:37 -0400 Original-Received: from [70.94.38.149] ([70.94.38.149:58869] helo=TAKVER) by dnvrco-oedge03 (envelope-from ) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 38/E1-06155-F8FEE145; Sun, 21 Sep 2014 15:32:31 +0000 In-Reply-To: (Vladimir Kazanov's message of "Sat, 20 Sep 2014 19:36:16 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) X-RR-Connecting-IP: 107.14.64.142:25 X-Authority-Analysis: v=2.1 cv=FMWVxoYs c=1 sm=1 tr=0 a=AppmJ/7ZOOFWL/q6u6u93g==:117 a=AppmJ/7ZOOFWL/q6u6u93g==:17 a=ayC55rCoAAAA:8 a=9XSUBuVRJI8A:10 a=o_R75loqY_IA:10 a=9i_RQKNPAAAA:8 a=pGLkceISAAAA:8 a=90B1hMdqGVZkZpQkgI8A:9 a=MSl-tDqOz04A:10 X-Cloudmark-Score: 0 X-detected-operating-system: by eggs.gnu.org: BaiduSpider X-Received-From: 107.14.73.229 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174608 Archived-At: Vladimir Kazanov writes: > Okay, I'll give text properties a try. > > Right now my vision for this mode is the following: > > - avoid retokenizing undamaged buffer parts at all costs (as a main > feature meant for incremental parsing); You might look at what I did in Ada mode (current source in ELPA); see wisi.el wisi-before/after-change. > - collect damages and do reparsing only when user stops editing, > similar to the font-lock-mode (js2-mode, nxml-mode...); Ada mode only reparses when the user requests an action that requires a parse. How else do you tell when a user "stops editing"? font-lock runs after 'idle-time', which appears to be about 2 seconds (I could not figure out from the structure of 'timer-idle-list' what the actual idle time is). I guess that's the approximation of when the user stops editing. I don't normally edit 7000 line files, so the Ada mode parsing delay is not noticeable to me, so I prefer the current Ada mode approach of not using the idle timer to trigger a parse. But it could be a user option. > - the incremental logic should have two interfaces, the first one > meant for language-specific tokenizing code and a second one - for the > user code, be it code beautifiers or advanced incremental parsers; > > - it should be possible to completely replace the font-lock-mode with > this mode, given a concrete language tokenizer; > > You said two things basically: 1) I must use text properties, 2) it is > possible to improve text properties interfaces to help the tokenizer. > I suggest the following plan: > > 1) try to implement the tokenizer using available text property > mechanics; Ada mode uses text properties to store parse results; the tokenizer results are part of that, but are not stored separately. I don't see much point in separating the tokenizer from the parser; the tokenizer results are not useful by themselves (at least, not in Ada mode). > 2) see if there are slow-downs or problems, or space for improvements > on the Emacs side. I have not noticed any problems with the text properties interface; in particular, storing and retrieving text properties is fast compared to parsing. Ada mode stores about two parse result text properties per source line on average. -- -- Stephe