unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* CEDET, DL & parsing thoughts (was Re: Release plans)
@ 2008-08-30  1:53 Eric M. Ludlam
  2008-08-30  2:24 ` Lennart Borgman (gmail)
  0 siblings, 1 reply; 7+ messages in thread
From: Eric M. Ludlam @ 2008-08-30  1:53 UTC (permalink / raw)
  To: emacs-devel

Hi,  Again no threading info, sorry.

>    Surely XRefactory's big advantage over CEDET is use of an EDG-based
>    parser (which costs money)? So in that sense the restrictions on how
>    the core gcc project develops (whether it can provide suitable dumps
>    of parse trees and the like) are more significant than restrictions on
>    Emacs?
>
>It could be.
>
>Hmm.  There is an architecture to consider:  Imagine dynamically
>loadable parsers for your favorite languages.   Might there be a
>reasonable API design such that a single parsing tool can do both
>incremental parsing / re-parsing and efficient straight-through parsing,
>producing output (in the form of API calls or return values) suitable
>for both building GCC trees and updating text properties and database
>values in an IDE?
>
>
>If so, can tools such as Bison be extended to support generation
>of the incremental (re-) parsing parts (e.g., with suitable ways of
>handling parse errors and recovery in an incremental context)?
>
>
>The resulting "kit" of Emacs w/DL + GCC w/DL +  extended Bison
>could be very fun to play with.

This is in effect what is in CEDET/Semantic now but without the DL.  I
had made a replacement for flex, but more Emacs Lisp centric, and
David Ponce ported bison into Emacs Lisp directly.  This bison port
supports incremental parsing, full parsing, reparsing, and is quite
fast, though not nearly as fast as actual flex/bison/c code.

I would assume the concepts in David Ponce's wisent parser generator
could be back-ported into Bison if desired.

If DLL's had existed before we did this, I would have liked to find a
way to feed an actual flex routine from a buffer, and have that feed
into a bison generated language.  Since those create fcns w/ a single
name, that could be hidden in the dll.  I also would have "borrowed"
the gcc .y file as a start.  I obviously didn't do this, nor did I try
the "subprocess", because I wanted "as you type" syntax checking,
which almost exists in the current version of CEDET. Doing that in an
external proces is irrational to do in an external program.  (See
flymake)

When I started, I really wanted to have a single generic parsing
infrastructure that could do indentation, coloring, and tagging.  As
it stands, I only really had time to focus on one thing, so I picked
that which had not been done, which is the dynamic tagging/completion
part.  This is the same state XRefactory is in.  The main difference,
however, is that XRefactory only does after-you-save tag management.
The integrated parser in CEDET will do as-you-type retagging, plus a
wide range of high-level decorating, and some powerful defun-level
movement, editing and folding.  As an out-of-Emacs process, XRefactory
has on-disk tables of tag usages which CEDET doesn't try to store in
Emacs process memory.

Unrelated to the DLL issue, one thing I think Emacs would benifit
from, is a single place where someone working on a "major-mode" could
encode the nature of the language.  Right now there are syntax-tables,
font-lock tables, imenu regexp, etags regexps, and, if you are lucky,
a robust indentation engine with some hairy partial-parsing in it.

I think it would be much nicer (which is why I've worked on it for so
long) to have a single "parser" that knows the language, that would
then be used by generic font lock, tagging, and indenting engines.  I
think the parser David Ponce made is a great place for tagging, and is
likely a great place to also embed the other parts, but it is likely
it will always be considered "slow" compared to the cute short-cuts
you find in font-lock and custom indentors.

Once CEDET is merged into Emacs, I hope to examine some of the speed
issues with others who know more what Emacs' internals are like.  (As
an FYI, all of CEDET's papers should now be in order for this.)

Thanks
Eric

-- 
          Eric Ludlam:                       eric@siege-engine.com
   Siege: www.siege-engine.com          Emacs: http://cedet.sourceforge.net




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CEDET, DL & parsing thoughts (was Re: Release plans)
  2008-08-30  1:53 CEDET, DL & parsing thoughts (was Re: Release plans) Eric M. Ludlam
@ 2008-08-30  2:24 ` Lennart Borgman (gmail)
  2008-08-30  9:04   ` Re[2]: CEDET, DL & parsing thoughts (was " Eric M. Ludlam
  0 siblings, 1 reply; 7+ messages in thread
From: Lennart Borgman (gmail) @ 2008-08-30  2:24 UTC (permalink / raw)
  To: Eric M. Ludlam; +Cc: emacs-devel

Eric M. Ludlam wrote:
> This is in effect what is in CEDET/Semantic now but without the DL.  I
> had made a replacement for flex, but more Emacs Lisp centric, and
> David Ponce ported bison into Emacs Lisp directly.  This bison port
> supports incremental parsing, full parsing, reparsing, and is quite
> fast, though not nearly as fast as actual flex/bison/c code.
> 
> I would assume the concepts in David Ponce's wisent parser generator
> could be back-ported into Bison if desired.

So (if I understand currectly from the little I know about this), a bit
sadly, if there were sutiable flex and bison dlls these could be used
instead (if Davids specials were backported too) and would be quite a
bit faster.

This does not relate to the general question of loading dll:s. This is a
special case that I suppose Richard would approve. (I have no idea, but
I guess there are no flex and bison dlls availabe, or are there?)

However for new languages all this requires also writing language
specific bison grammars. Is this perhaps be a big job that just a few
person have insight in how to do?

> When I started, I really wanted to have a single generic parsing
> infrastructure that could do indentation, coloring, and tagging.

nxml-mode does that, but on its own of course.

> Once CEDET is merged into Emacs, I hope to examine some of the speed
> issues with others who know more what Emacs' internals are like.  (As
> an FYI, all of CEDET's papers should now be in order for this.)

Great. Then we can hope for that it is easy to get started using CEDET.

> Thanks
> Eric
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re[2]: CEDET, DL & parsing thoughts (was Release plans)
  2008-08-30  2:24 ` Lennart Borgman (gmail)
@ 2008-08-30  9:04   ` Eric M. Ludlam
  2008-09-02 14:13     ` Richard M. Stallman
  0 siblings, 1 reply; 7+ messages in thread
From: Eric M. Ludlam @ 2008-08-30  9:04 UTC (permalink / raw)
  To: Lennart Borgman (gmail); +Cc: emacs-devel

>>> "Lennart Borgman (gmail)" <lennart.borgman@gmail.com> seems to think that:
>Eric M. Ludlam wrote:
>> This is in effect what is in CEDET/Semantic now but without the DL.  I
>> had made a replacement for flex, but more Emacs Lisp centric, and
>> David Ponce ported bison into Emacs Lisp directly.  This bison port
>> supports incremental parsing, full parsing, reparsing, and is quite
>> fast, though not nearly as fast as actual flex/bison/c code.
>> 
>> I would assume the concepts in David Ponce's wisent parser generator
>> could be back-ported into Bison if desired.
>
>So (if I understand currectly from the little I know about this), a bit
>sadly, if there were sutiable flex and bison dlls these could be used
>instead (if Davids specials were backported too) and would be quite a
>bit faster.

I can imagine how it *might* be done, though to be honest, it may be
that having "eval" actions in the real bison generated parser might be
all it takes to slow it down.  I would not claim this as an area of
expertise.

>This does not relate to the general question of loading dll:s. This is a
>special case that I suppose Richard would approve. (I have no idea, but
>I guess there are no flex and bison dlls availabe, or are there?)

It would work like this:

1) lex/grammar file -> flex & bison -> c-files
2) compile flex-bison output + generic interface -> dl
3) load dl, bind generated parser commands to fcns
4) bind those fcns to generic CEDET/Semantic calls
5) Semantic infrastructure calls into the dll for parsing goodness.

>However for new languages all this requires also writing language
>specific bison grammars. Is this perhaps be a big job that just a few
>person have insight in how to do?

My goal was to build a tool that would make it easy for anyone to
write a parser that would feed into an infrastructure that provides
the complex specialy functionality.

This is true now for what is in CEDET.  There are already 6 grammars
that work pretty well that enthusiasts of a particular language have
written.  (Erlang, python, csharp, javascript, 2 php parsers, and
ruby.)

>> When I started, I really wanted to have a single generic parsing
>> infrastructure that could do indentation, coloring, and tagging.
>
>nxml-mode does that, but on its own of course.

It is my understanding that only nxml, and that new javascript mode do
this, making the undertaking a statistically less likely occurance in
the current infrastructure.

>> Once CEDET is merged into Emacs, I hope to examine some of the speed
>> issues with others who know more what Emacs' internals are like.  (As
>> an FYI, all of CEDET's papers should now be in order for this.)
>
>Great. Then we can hope for that it is easy to get started using CEDET.

I agree.
Eric

-- 
          Eric Ludlam:                       eric@siege-engine.com
   Siege: www.siege-engine.com          Emacs: http://cedet.sourceforge.net




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CEDET, DL & parsing thoughts (was Release plans)
  2008-08-30  9:04   ` Re[2]: CEDET, DL & parsing thoughts (was " Eric M. Ludlam
@ 2008-09-02 14:13     ` Richard M. Stallman
  2008-09-02 14:39       ` CEDET, DL & parsing thoughts joakim
  0 siblings, 1 reply; 7+ messages in thread
From: Richard M. Stallman @ 2008-09-02 14:13 UTC (permalink / raw)
  To: Eric M. Ludlam; +Cc: lennart.borgman, emacs-devel

    This is true now for what is in CEDET.  There are already 6 grammars
    that work pretty well that enthusiasts of a particular language have
    written.  (Erlang, python, csharp, javascript, 2 php parsers, and
    ruby.)

I note the absence of C and C++.  I guess that's because their grammar
is so complex that the job would be hard to do.  But it is a real
shame to support Microsoft's language, C#, and not support Java.
Can you recruit someone to support Java?

Also we need Emacs Lisp and Scheme?




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CEDET, DL & parsing thoughts
  2008-09-02 14:13     ` Richard M. Stallman
@ 2008-09-02 14:39       ` joakim
  2008-09-02 20:01         ` Re[2]: " Eric M. Ludlam
  0 siblings, 1 reply; 7+ messages in thread
From: joakim @ 2008-09-02 14:39 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, lennart.borgman, Eric M. Ludlam

"Richard M. Stallman" <rms@gnu.org> writes:

>     This is true now for what is in CEDET.  There are already 6 grammars
>     that work pretty well that enthusiasts of a particular language have
>     written.  (Erlang, python, csharp, javascript, 2 php parsers, and
>     ruby.)
>
> I note the absence of C and C++.  I guess that's because their grammar
> is so complex that the job would be hard to do.  But it is a real
> shame to support Microsoft's language, C#, and not support Java.
> Can you recruit someone to support Java?
>
> Also we need Emacs Lisp and Scheme?

cedet/semantic/wisent/wisent-java.wy
cedet/semantic/wisent/wisent-java-tags.wy
cedet/semantic/wisent/wisent-awk.wy
cedet/semantic/wisent/wisent-calc.wy
cedet/semantic/wisent/wisent-python.wy
cedet/semantic/wisent/wisent-cim.wy
cedet/semantic/wisent/wisent-c.wy
cedet/semantic/semantic-grammar.wy
cedet/cogre/wisent-dot.wy
cedet/contrib/wisent-ruby.wy
cedet/contrib/wisent-javascript-jv.wy
cedet/contrib/wisent-php.wy
cedet/contrib/wisent-csharp.wy
cedet/srecode/srecode-template.wy


So, I think Eric meant the files in "contrib" whereas Cedet supports all
the grammars above, including Java and C.

(I contributed the javascript grammar, and I intend to move it to the
Cedet core, so it also will be included when Cedet is merged in Emacs
CVS.)

>
-- 
Joakim Verona




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re[2]: CEDET, DL & parsing thoughts
  2008-09-02 14:39       ` CEDET, DL & parsing thoughts joakim
@ 2008-09-02 20:01         ` Eric M. Ludlam
  2008-09-03  2:41           ` Richard M. Stallman
  0 siblings, 1 reply; 7+ messages in thread
From: Eric M. Ludlam @ 2008-09-02 20:01 UTC (permalink / raw)
  To: joakim; +Cc: lennart.borgman, rms, emacs-devel

>>> joakim@verona.se seems to think that:
>"Richard M. Stallman" <rms@gnu.org> writes:
>
>>     This is true now for what is in CEDET.  There are already 6 grammars
>>     that work pretty well that enthusiasts of a particular language have
>>     written.  (Erlang, python, csharp, javascript, 2 php parsers, and
>>     ruby.)
>>
>> I note the absence of C and C++.  I guess that's because their grammar
>> is so complex that the job would be hard to do.  But it is a real
>> shame to support Microsoft's language, C#, and not support Java.
>> Can you recruit someone to support Java?
>>
>> Also we need Emacs Lisp and Scheme?
>
>cedet/semantic/wisent/wisent-java.wy
>cedet/semantic/wisent/wisent-java-tags.wy
>cedet/semantic/wisent/wisent-awk.wy
>cedet/semantic/wisent/wisent-calc.wy
>cedet/semantic/wisent/wisent-python.wy
>cedet/semantic/wisent/wisent-cim.wy
>cedet/semantic/wisent/wisent-c.wy
>cedet/semantic/semantic-grammar.wy
>cedet/cogre/wisent-dot.wy
>cedet/contrib/wisent-ruby.wy
>cedet/contrib/wisent-javascript-jv.wy
>cedet/contrib/wisent-php.wy
>cedet/contrib/wisent-csharp.wy
>cedet/srecode/srecode-template.wy
>
>
>So, I think Eric meant the files in "contrib" whereas Cedet supports all
>the grammars above, including Java and C.

Yes.  I was trying to point out that it is easy to support a language
using CEDET/Semantic by the existence of the 6 grammars which *I* did
not write.  The question I was trying to answer was if only super
hackers could support a language, so I was trying to point this out
that there are 6 regular hackers who have been successful.  There are
only two such hand crafted all Elisp super-parsers that I know of,
thus justifying the model I had proposed.  This, of course, assumes I
am a super-hacker.  I would rather think I'm just more familiar with
CEDET internals than most. :)

I, and other core CEDET developers have, as joakim pointed out above,
implemented many other grammars.  Missing from joakim's list are:

cedet/semantic/bovine/make.by
cedet/semantic/bovine/scheme.by
cedet/semantic/bovine/c.by     ; also has c++

plus a hand-written grammar for Elisp.  (I skipped the grammar
compilation step, and just typed in compiler compiler style output.)

There are also some regexp based grammars for html and texinfo, and
grammars for the grammar files.

>(I contributed the javascript grammar, and I intend to move it to the
>Cedet core, so it also will be included when Cedet is merged in Emacs
>CVS.)
  [ ... ]

If you have signed this stuff over, let me know and I'll move it.

Thanks
Eric




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CEDET, DL & parsing thoughts
  2008-09-02 20:01         ` Re[2]: " Eric M. Ludlam
@ 2008-09-03  2:41           ` Richard M. Stallman
  0 siblings, 0 replies; 7+ messages in thread
From: Richard M. Stallman @ 2008-09-03  2:41 UTC (permalink / raw)
  To: Eric M. Ludlam; +Cc: lennart.borgman, joakim, emacs-devel

    I, and other core CEDET developers have, as joakim pointed out above,
    implemented many other grammars.  Missing from joakim's list are:

    cedet/semantic/bovine/make.by
    cedet/semantic/bovine/scheme.by
    cedet/semantic/bovine/c.by     ; also has c++

That is good news.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-09-03  2:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-30  1:53 CEDET, DL & parsing thoughts (was Re: Release plans) Eric M. Ludlam
2008-08-30  2:24 ` Lennart Borgman (gmail)
2008-08-30  9:04   ` Re[2]: CEDET, DL & parsing thoughts (was " Eric M. Ludlam
2008-09-02 14:13     ` Richard M. Stallman
2008-09-02 14:39       ` CEDET, DL & parsing thoughts joakim
2008-09-02 20:01         ` Re[2]: " Eric M. Ludlam
2008-09-03  2:41           ` Richard M. Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).