From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Tokenizing Date: Mon, 22 Sep 2014 08:15:51 -0500 Message-ID: <85zjdrkd88.fsf@stephe-leake.org> References: <85ha01dm5u.fsf@stephe-leake.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1411391808 31303 80.91.229.3 (22 Sep 2014 13:16:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Sep 2014 13:16:48 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Sep 22 15:16:41 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XW3Ts-00054w-RV for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 15:16:41 +0200 Original-Received: from localhost ([::1]:46577 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW3Ts-0002Di-1f for ged-emacs-devel@m.gmane.org; Mon, 22 Sep 2014 09:16:40 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49765) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW3TO-00021l-5m for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:17 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XW3TF-0002Py-Px for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:10 -0400 Original-Received: from dnvrco-outbound-snat.email.rr.com ([107.14.73.227]:30533 helo=dnvrco-oedge-vip.email.rr.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XW3TF-0002P2-LS for emacs-devel@gnu.org; Mon, 22 Sep 2014 09:16:01 -0400 Original-Received: from [70.94.38.149] ([70.94.38.149:50438] helo=TAKVER) by dnvrco-oedge02 (envelope-from ) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 53/A1-08316-80120245; Mon, 22 Sep 2014 13:15:53 +0000 In-Reply-To: (Vladimir Kazanov's message of "Sun, 21 Sep 2014 21:55:46 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) X-RR-Connecting-IP: 107.14.64.130:25 X-Authority-Analysis: v=2.1 cv=ReIeCjdv c=1 sm=1 tr=0 a=AppmJ/7ZOOFWL/q6u6u93g==:117 a=AppmJ/7ZOOFWL/q6u6u93g==:17 a=ayC55rCoAAAA:8 a=9XSUBuVRJI8A:10 a=o_R75loqY_IA:10 a=9i_RQKNPAAAA:8 a=pGLkceISAAAA:8 a=GFtirlOa14UrT1pzq6oA:9 a=MSl-tDqOz04A:10 X-Cloudmark-Score: 0 X-detected-operating-system: by eggs.gnu.org: BaiduSpider X-Received-From: 107.14.73.227 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174644 Archived-At: Vladimir Kazanov writes: >> Ada mode uses text properties to store parse results; the tokenizer >> results are part of that, but are not stored separately. I don't see >> much point in separating the tokenizer from the parser; the tokenizer >> results are not useful by themselves (at least, not in Ada mode). >> > > First, this not quite right. Tokenization results can be used, for > example, for granular syntax highlighting. Ada requires semantic parsing, not just tokenizing, for syntax highlighting (it's a complex language :). Hmm, I'm not sure what you mean by "granualar" here; if you mean "less than totally accurate", then you are right, we don't need a parser for that. On the other hand, we don't need a tokenizer, either :). In Ada, a parser is required to distinguish between these two instances of 'return': function Foo (...) return Bar is begin Baz := ...; return Baz; end Foo; In the first instance, "Bar" is a type; in the second, "Baz" is a variable. They should have different faces. Finding the 'function' keyword from just token information is non-trivial. The parser tags Bar as a type. > Font Lock basically just > uses regexps to catch something that looks like > comments/keywords/whatever. But that can be extended by arbitrary functions in font-lock-add-keywords; Ada mode does that to use the parser information (when available; it doesn't force a parse just for font-lock). > Second, it not a tokenizer I want to build, there is a > misunderstanding of sorts. It is a helper mode (similar to Font Lock, > in a way) for keeping token lists up to date all the time, easy and > fast. User code - the tokenizer itself - will just have to provide an > interface to the mode (be restartable and supply required restart > information in resulting tokens). The mode will use the information to > avoid extra tokenizing. Ok. Maybe I can use that, and have it run the parser whenever needed. Just replace "token lists" with "some text properties" in the above; the helper mode should not care if they are "tokenizer results" or "parser results". >> I have not noticed any problems with the text properties interface; in >> particular, storing and retrieving text properties is fast compared to >> parsing. Ada mode stores about two parse result text properties per >> source line on average. > > I did not know about your mode - and parsers are sort of my hobby :-) > I will definitely check it out, especially because it uses GLR(it > really does?!), which can non-trivial to implement. Yes, implementing GLR was complicated, and therefore fun :). -- -- Stephe