From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Eric M. Ludlam" Newsgroups: gmane.emacs.devel Subject: CEDET, DL & parsing thoughts (was Re: Release plans) Date: Fri, 29 Aug 2008 21:53:10 -0400 Message-ID: <200808300153.m7U1rAxA027402@projectile.siege-engine.com> NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1220061210 15087 80.91.229.12 (30 Aug 2008 01:53:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 30 Aug 2008 01:53:30 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Aug 30 03:54:23 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KZFfQ-0007bK-BG for ged-emacs-devel@m.gmane.org; Sat, 30 Aug 2008 03:54:20 +0200 Original-Received: from localhost ([127.0.0.1]:48793 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KZFeR-0006zt-V8 for ged-emacs-devel@m.gmane.org; Fri, 29 Aug 2008 21:53:19 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KZFeM-0006wa-8k for emacs-devel@gnu.org; Fri, 29 Aug 2008 21:53:14 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KZFeL-0006w4-MT for emacs-devel@gnu.org; Fri, 29 Aug 2008 21:53:14 -0400 Original-Received: from [199.232.76.173] (port=45439 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KZFeL-0006vz-K0 for emacs-devel@gnu.org; Fri, 29 Aug 2008 21:53:13 -0400 Original-Received: from static-71-184-83-10.bstnma.fios.verizon.net ([71.184.83.10]:53029 helo=projectile.siege-engine.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KZFeL-0000Tc-8N for emacs-devel@gnu.org; Fri, 29 Aug 2008 21:53:13 -0400 Original-Received: from projectile.siege-engine.com (localhost.localdomain [127.0.0.1]) by projectile.siege-engine.com (8.12.8/8.12.8) with ESMTP id m7U1rBWB027404 for ; Fri, 29 Aug 2008 21:53:11 -0400 Original-Received: (from zappo@localhost) by projectile.siege-engine.com (8.12.8/8.12.8/Submit) id m7U1rAxA027402; Fri, 29 Aug 2008 21:53:10 -0400 X-detected-kernel: by monty-python.gnu.org: Linux 2.4-2.6 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:103236 Archived-At: Hi, Again no threading info, sorry. > Surely XRefactory's big advantage over CEDET is use of an EDG-based > parser (which costs money)? So in that sense the restrictions on how > the core gcc project develops (whether it can provide suitable dumps > of parse trees and the like) are more significant than restrictions on > Emacs? > >It could be. > >Hmm. There is an architecture to consider: Imagine dynamically >loadable parsers for your favorite languages. Might there be a >reasonable API design such that a single parsing tool can do both >incremental parsing / re-parsing and efficient straight-through parsing, >producing output (in the form of API calls or return values) suitable >for both building GCC trees and updating text properties and database >values in an IDE? > > >If so, can tools such as Bison be extended to support generation >of the incremental (re-) parsing parts (e.g., with suitable ways of >handling parse errors and recovery in an incremental context)? > > >The resulting "kit" of Emacs w/DL + GCC w/DL + extended Bison >could be very fun to play with. This is in effect what is in CEDET/Semantic now but without the DL. I had made a replacement for flex, but more Emacs Lisp centric, and David Ponce ported bison into Emacs Lisp directly. This bison port supports incremental parsing, full parsing, reparsing, and is quite fast, though not nearly as fast as actual flex/bison/c code. I would assume the concepts in David Ponce's wisent parser generator could be back-ported into Bison if desired. If DLL's had existed before we did this, I would have liked to find a way to feed an actual flex routine from a buffer, and have that feed into a bison generated language. Since those create fcns w/ a single name, that could be hidden in the dll. I also would have "borrowed" the gcc .y file as a start. I obviously didn't do this, nor did I try the "subprocess", because I wanted "as you type" syntax checking, which almost exists in the current version of CEDET. Doing that in an external proces is irrational to do in an external program. (See flymake) When I started, I really wanted to have a single generic parsing infrastructure that could do indentation, coloring, and tagging. As it stands, I only really had time to focus on one thing, so I picked that which had not been done, which is the dynamic tagging/completion part. This is the same state XRefactory is in. The main difference, however, is that XRefactory only does after-you-save tag management. The integrated parser in CEDET will do as-you-type retagging, plus a wide range of high-level decorating, and some powerful defun-level movement, editing and folding. As an out-of-Emacs process, XRefactory has on-disk tables of tag usages which CEDET doesn't try to store in Emacs process memory. Unrelated to the DLL issue, one thing I think Emacs would benifit from, is a single place where someone working on a "major-mode" could encode the nature of the language. Right now there are syntax-tables, font-lock tables, imenu regexp, etags regexps, and, if you are lucky, a robust indentation engine with some hairy partial-parsing in it. I think it would be much nicer (which is why I've worked on it for so long) to have a single "parser" that knows the language, that would then be used by generic font lock, tagging, and indenting engines. I think the parser David Ponce made is a great place for tagging, and is likely a great place to also embed the other parts, but it is likely it will always be considered "slow" compared to the cute short-cuts you find in font-lock and custom indentors. Once CEDET is merged into Emacs, I hope to examine some of the speed issues with others who know more what Emacs' internals are like. (As an FYI, all of CEDET's papers should now be in order for this.) Thanks Eric -- Eric Ludlam: eric@siege-engine.com Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net