From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Vladimir Kazanov Newsgroups: gmane.emacs.devel Subject: Re: Tokenizing Date: Sun, 21 Sep 2014 21:55:46 +0300 Message-ID: References: <85ha01dm5u.fsf@stephe-leake.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1411325787 20596 80.91.229.3 (21 Sep 2014 18:56:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 21 Sep 2014 18:56:27 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 21 20:56:19 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XVmJ1-0005Lr-6b for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 20:56:19 +0200 Original-Received: from localhost ([::1]:40576 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVmJ0-0005H4-CD for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 14:56:18 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60884) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVmIw-0005Gv-O2 for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XVmIu-0006zz-Th for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:14 -0400 Original-Received: from mail-ie0-x234.google.com ([2607:f8b0:4001:c03::234]:59504) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XVmIu-0006zK-Nv for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:12 -0400 Original-Received: by mail-ie0-f180.google.com with SMTP id ar1so3326311iec.39 for ; Sun, 21 Sep 2014 11:56:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=OZ5dHhFcMx2rVKjBVwa2RsS/gmB0O+gp5LOqob90RJc=; b=shCgMRxMbGCPDkaprWjR4VOZdIziYeByuNdcVx+V2uTOLGCW2FvE/3RUA3o0s1RxJN 7i5qyiSSSBBdMMTrqWTihTs9u1QQeg33Zw+fDfwDKKPC6MILlHj8PnCBpClWcFl2VawK ipWgzgPWvCtwtH6n/S33LIujive9X5v4jr80P4gKFQDq7LvL879jxPSQH7dvZJkpvb0p GeriCJMswmXsc/hfLi1V4SuAl441FLk3pF9lABprJarnyIld10kUipZ6OgmNg4vjebvS lLD7FWyrjLLXcjiuvYp4wnPpE/m5SPgISl8MwvvZg8BpgYByE0RCrRZe49S48HMtzXv3 igiw== X-Received: by 10.50.136.167 with SMTP id qb7mr9909040igb.31.1411325766538; Sun, 21 Sep 2014 11:56:06 -0700 (PDT) Original-Received: by 10.107.18.133 with HTTP; Sun, 21 Sep 2014 11:55:46 -0700 (PDT) In-Reply-To: <85ha01dm5u.fsf@stephe-leake.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4001:c03::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174618 Archived-At: > I don't normally edit 7000 line files, so the Ada mode parsing delay is > not noticeable to me, so I prefer the current Ada mode approach of not > using the idle timer to trigger a parse. But it could be a user option. > I will look into that. Although the main idea is to *keep the token list consistent* most of the time. There will definitely be customization possibilities. > > Ada mode uses text properties to store parse results; the tokenizer > results are part of that, but are not stored separately. I don't see > much point in separating the tokenizer from the parser; the tokenizer > results are not useful by themselves (at least, not in Ada mode). > First, this not quite right. Tokenization results can be used, for example, for granular syntax highlighting. Font Lock basically just uses regexps to catch something that looks like comments/keywords/whatever. Tokenizer already *knows* for sure what it found. And you don't have to build a full parser to use the results. Second, it not a tokenizer I want to build, there is a misunderstanding of sorts. It is a helper mode (similar to Font Lock, in a way) for keeping token lists up to date all the time, easy and fast. User code - the tokenizer itself - will just have to provide an interface to the mode (be restartable and supply required restart information in resulting tokens). The mode will use the information to avoid extra tokenizing. > I have not noticed any problems with the text properties interface; in > particular, storing and retrieving text properties is fast compared to > parsing. Ada mode stores about two parse result text properties per > source line on average. I did not know about your mode - and parsers are sort of my hobby :-) I will definitely check it out, especially because it uses GLR(it really does?!), which can non-trivial to implement. --=20 Yours sincerely, Vladimir Kazanov -- =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=9A=D0=B0=D0=B7=D0=B0= =D0=BD=D0=BE=D0=B2