From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Vladimir Kazanov <vekazanov@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Tokenizing
Date: Sun, 21 Sep 2014 21:55:46 +0300
Message-ID: <CAAs=0-0jNfgkBVoQaV+GWBRCu=-yLteJU83+4EbVin4a5fqTGg@mail.gmail.com>
References: <CAAs=0-1-1KCMOvWm7+kt8Ddiu1BZznoUpB2SVFAM_fV3zNVE8A@mail.gmail.com>
	<E1XV2XA-0002kq-7t@fencepost.gnu.org>
	<CAAs=0-0Jr7kiJuqmG_sxrO6VAMf7NEoJ-FAed-ZNE0H5srBJ8A@mail.gmail.com>
	<E1XVKb1-0006SZ-Un@fencepost.gnu.org>
	<CAAs=0-05L9nV69HPWTfEgtEq_x8LiGivmCo7vOJEr5dDJMVCsg@mail.gmail.com>
	<85ha01dm5u.fsf@stephe-leake.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1411325787 20596 80.91.229.3 (21 Sep 2014 18:56:27 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 21 Sep 2014 18:56:27 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Stephen Leake <stephen_leake@stephe-leake.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 21 20:56:19 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XVmJ1-0005Lr-6b
	for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 20:56:19 +0200
Original-Received: from localhost ([::1]:40576 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XVmJ0-0005H4-CD
	for ged-emacs-devel@m.gmane.org; Sun, 21 Sep 2014 14:56:18 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60884)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vekazanov@gmail.com>) id 1XVmIw-0005Gv-O2
	for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:15 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <vekazanov@gmail.com>) id 1XVmIu-0006zz-Th
	for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:14 -0400
Original-Received: from mail-ie0-x234.google.com ([2607:f8b0:4001:c03::234]:59504)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vekazanov@gmail.com>) id 1XVmIu-0006zK-Nv
	for emacs-devel@gnu.org; Sun, 21 Sep 2014 14:56:12 -0400
Original-Received: by mail-ie0-f180.google.com with SMTP id ar1so3326311iec.39
	for <emacs-devel@gnu.org>; Sun, 21 Sep 2014 11:56:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-type:content-transfer-encoding;
	bh=OZ5dHhFcMx2rVKjBVwa2RsS/gmB0O+gp5LOqob90RJc=;
	b=shCgMRxMbGCPDkaprWjR4VOZdIziYeByuNdcVx+V2uTOLGCW2FvE/3RUA3o0s1RxJN
	7i5qyiSSSBBdMMTrqWTihTs9u1QQeg33Zw+fDfwDKKPC6MILlHj8PnCBpClWcFl2VawK
	ipWgzgPWvCtwtH6n/S33LIujive9X5v4jr80P4gKFQDq7LvL879jxPSQH7dvZJkpvb0p
	GeriCJMswmXsc/hfLi1V4SuAl441FLk3pF9lABprJarnyIld10kUipZ6OgmNg4vjebvS
	lLD7FWyrjLLXcjiuvYp4wnPpE/m5SPgISl8MwvvZg8BpgYByE0RCrRZe49S48HMtzXv3
	igiw==
X-Received: by 10.50.136.167 with SMTP id qb7mr9909040igb.31.1411325766538;
	Sun, 21 Sep 2014 11:56:06 -0700 (PDT)
Original-Received: by 10.107.18.133 with HTTP; Sun, 21 Sep 2014 11:55:46 -0700 (PDT)
In-Reply-To: <85ha01dm5u.fsf@stephe-leake.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2607:f8b0:4001:c03::234
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:174618
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/174618>

> I don't normally edit 7000 line files, so the Ada mode parsing delay is
> not noticeable to me, so I prefer the current Ada mode approach of not
> using the idle timer to trigger a parse. But it could be a user option.
>

I will look into that. Although the main idea is to *keep the token
list consistent* most of the time. There will definitely be
customization possibilities.

>
> Ada mode uses text properties to store parse results; the tokenizer
> results are part of that, but are not stored separately. I don't see
> much point in separating the tokenizer from the parser; the tokenizer
> results are not useful by themselves (at least, not in Ada mode).
>

First, this not quite right. Tokenization results can be used, for
example, for granular syntax highlighting. Font Lock basically just
uses regexps to catch something that looks like
comments/keywords/whatever. Tokenizer already *knows* for sure what it
found. And you don't have to build a full parser to use the results.

Second, it not a tokenizer I want to build, there is a
misunderstanding of sorts. It is a helper mode (similar to Font Lock,
in a way) for keeping token lists up to date all the time, easy and
fast. User code - the tokenizer itself - will just have to provide an
interface to the mode (be restartable and supply required restart
information in resulting tokens). The mode will use the information to
avoid extra tokenizing.

> I have not noticed any problems with the text properties interface; in
> particular, storing and retrieving text properties is fast compared to
> parsing. Ada mode stores about two parse result text properties per
> source line on average.

I did not know about your mode - and parsers are sort of my hobby :-)
I will definitely check it out, especially because it uses GLR(it
really does?!), which can non-trivial to implement.


--=20
Yours sincerely,


Vladimir Kazanov


--
=D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC,

=D0=92=D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=9A=D0=B0=D0=B7=D0=B0=
=D0=BD=D0=BE=D0=B2