From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Tokenizing
Date: Mon, 22 Sep 2014 08:15:51 -0500
Message-ID: <85zjdrkd88.fsf@stephe-leake.org>
References: <CAAs=0-1-1KCMOvWm7+kt8Ddiu1BZznoUpB2SVFAM_fV3zNVE8A@mail.gmail.com>
	<E1XV2XA-0002kq-7t@fencepost.gnu.org>
	<CAAs=0-0Jr7kiJuqmG_sxrO6VAMf7NEoJ-FAed-ZNE0H5srBJ8A@mail.gmail.com>
	<E1XVKb1-0006SZ-Un@fencepost.gnu.org>
	<CAAs=0-05L9nV69HPWTfEgtEq_x8LiGivmCo7vOJEr5dDJMVCsg@mail.gmail.com>
	<85ha01dm5u.fsf@stephe-leake.org>
	<CAAs=0-0jNfgkBVoQaV+GWBRCu=-yLteJU83+4EbVin4a5fqTGg@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1411391808 31303 80.91.229.3 (22 Sep 2014 13:16:48 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 22 Sep 2014 13:16:48 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Sep 22 15:16:41 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XW3Ts-00054w-RV
	for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 15:16:41 +0200
Original-Received: from localhost ([::1]:46577 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XW3Ts-0002Di-1f
	for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 09:16:40 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49765)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@stephe-leake.org>) id 1XW3TO-00021l-5m
	for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:17 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen_leake@stephe-leake.org>) id 1XW3TF-0002Py-Px
	for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:10 -0400
Original-Received: from dnvrco-outbound-snat.email.rr.com ([107.14.73.227]:30533
	helo=dnvrco-oedge-vip.email.rr.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen_leake@stephe-leake.org>) id 1XW3TF-0002P2-LS
	for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:01 -0400
Original-Received: from [70.94.38.149] ([70.94.38.149:50438] helo=TAKVER)
	by dnvrco-oedge02 (envelope-from <stephen_leake@stephe-leake.org>)
	(ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP
	id 53/A1-08316-80120245; Mon, 22 Sep 2014 13:15:53 +0000
In-Reply-To: <CAAs=0-0jNfgkBVoQaV+GWBRCu=-yLteJU83+4EbVin4a5fqTGg@mail.gmail.com>
	(Vladimir Kazanov's message of "Sun, 21 Sep 2014 21:55:46 +0300")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt)
X-RR-Connecting-IP: 107.14.64.130:25
X-Authority-Analysis: v=2.1 cv=ReIeCjdv c=1 sm=1 tr=0
	a=AppmJ/7ZOOFWL/q6u6u93g==:117 a=AppmJ/7ZOOFWL/q6u6u93g==:17
	a=ayC55rCoAAAA:8 a=9XSUBuVRJI8A:10 a=o_R75loqY_IA:10
	a=9i_RQKNPAAAA:8 a=pGLkceISAAAA:8 a=GFtirlOa14UrT1pzq6oA:9
	a=MSl-tDqOz04A:10
X-Cloudmark-Score: 0
X-detected-operating-system: by eggs.gnu.org: BaiduSpider 
X-Received-From: 107.14.73.227
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:174644
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/174644>

Vladimir Kazanov <vekazanov@gmail.com> writes:

>> Ada mode uses text properties to store parse results; the tokenizer
>> results are part of that, but are not stored separately. I don't see
>> much point in separating the tokenizer from the parser; the tokenizer
>> results are not useful by themselves (at least, not in Ada mode).
>>
>
> First, this not quite right. Tokenization results can be used, for
> example, for granular syntax highlighting. 

Ada requires semantic parsing, not just tokenizing, for syntax
highlighting (it's a complex language :).

Hmm, I'm not sure what you mean by "granualar" here; if you mean "less
than totally accurate", then you are right, we don't need a parser for
that. On the other hand, we don't need a tokenizer, either :).

In Ada, a parser is required to distinguish between these two instances
of 'return':

function Foo (...) return Bar
is begin
  Baz := ...;
  return Baz;
end Foo;

In the first instance, "Bar" is a type; in the second, "Baz" is a
variable. They should have different faces.

Finding the 'function' keyword from just token information is
non-trivial. The parser tags Bar as a type.

> Font Lock basically just
> uses regexps to catch something that looks like
> comments/keywords/whatever. 

But that can be extended by arbitrary functions in
font-lock-add-keywords; Ada mode does that to use the parser information
(when available; it doesn't force a parse just for font-lock).

> Second, it not a tokenizer I want to build, there is a
> misunderstanding of sorts. It is a helper mode (similar to Font Lock,
> in a way) for keeping token lists up to date all the time, easy and
> fast. User code - the tokenizer itself - will just have to provide an
> interface to the mode (be restartable and supply required restart
> information in resulting tokens). The mode will use the information to
> avoid extra tokenizing.

Ok. Maybe I can use that, and have it run the parser whenever needed.
Just replace "token lists" with "some text properties" in the above; the
helper mode should not care if they are "tokenizer results" or "parser
results".

>> I have not noticed any problems with the text properties interface; in
>> particular, storing and retrieving text properties is fast compared to
>> parsing. Ada mode stores about two parse result text properties per
>> source line on average.
>
> I did not know about your mode - and parsers are sort of my hobby :-)
> I will definitely check it out, especially because it uses GLR(it
> really does?!), which can non-trivial to implement.

Yes, implementing GLR was complicated, and therefore fun :).

-- 
-- Stephe