CEDET merge question

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* CEDET merge question
@ 2009-09-05 16:28 Chong Yidong
  2009-09-05 17:22 ` David Engster
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Chong Yidong @ 2009-09-05 16:28 UTC (permalink / raw)
  To: emacs-devel

I have a question about CEDET that hopefully someone on this list, who
has more experience using CEDET than me, can help answer (I've been
corresponding with Eric Ludlam, but he's gone on vacation).

The Semantic parser appears to have two major "back-ends", bovine and
wisent, which are used to generate Semantic tags.  Does anyone know how
crucial these packages are, and whether one or the other (or both) be
dropped or somehow trimmed down?

I ask because the CEDET merge already involves an uncomfortably large
amount of code, and it's rather dismaying to see these two big code
trees "embedded" in subdirectories of Semantic.  (Wisent, for instance,
appears to be an entire Elisp reimplementation of Bison...)

(The CVS branch I'm using for the CEDET merge is not yet suitable for
general testing; I'll inform the list when it's ready.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-05 16:28 CEDET merge question Chong Yidong
@ 2009-09-05 17:22 ` David Engster
  2009-09-05 20:53   ` Chong Yidong
  2009-09-06 15:37 ` Richard Stallman
  2009-09-08  8:11 ` joakim
  2 siblings, 1 reply; 29+ messages in thread
From: David Engster @ 2009-09-05 17:22 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:
> The Semantic parser appears to have two major "back-ends", bovine and
> wisent, which are used to generate Semantic tags.  Does anyone know how
> crucial these packages are, and whether one or the other (or both) be
> dropped or somehow trimmed down?

I think it depends on the question if people should be able to edit and
compile the grammars itself, only using Emacs proper.

The bovine/wisent parsers and major-modes are crucial for development,
but I think they are not necessarily needed for the resulting parser; I
may be wrong though, especially when it comes to the Wisent parser,
which I'm not familiar with at all.

For example, the file semantic/bovine/c.by is the Bison grammar for
C/C++ parsing. During CEDET's make process, the 'bovine' code generates
the file semantic/bovine/semantic-c-by.el, which is the resulting C(++)
lexer in Emacs Lisp. This file is then required by
semantic-c.el. Therefore, I would think that including the resulting
semantic-c-by.el should be enough for the C parser to be working.

As mentioned above, there are also the major-modes for bison/wisent in
CEDET (bovine-grammar.el, wisent-grammar.el) which are needed for
writing and debugging the grammar files. I think those would also not
necessarily be needed in Emacs. However, if people would like to extend
or fix grammar files (or write new ones), they would then have to get
CEDET from CVS.

> (Wisent, for instance, appears to be an entire Elisp reimplementation
> of Bison...)

Yes, it is exactly that. :-)

Regards,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-05 17:22 ` David Engster
@ 2009-09-05 20:53   ` Chong Yidong
  2009-09-05 23:08     ` David Engster
  0 siblings, 1 reply; 29+ messages in thread
From: Chong Yidong @ 2009-09-05 20:53 UTC (permalink / raw)
  To: David Engster; +Cc: emacs-devel

David Engster <deng@randomsample.de> writes:

> The bovine/wisent parsers and major-modes are crucial for development,
> but I think they are not necessarily needed for the resulting parser; I
> may be wrong though, especially when it comes to the Wisent parser,
> which I'm not familiar with at all.
>
> For example, the file semantic/bovine/c.by is the Bison grammar for
> C/C++ parsing. During CEDET's make process, the 'bovine' code generates
> the file semantic/bovine/semantic-c-by.el, which is the resulting C(++)
> lexer in Emacs Lisp. This file is then required by
> semantic-c.el. Therefore, I would think that including the resulting
> semantic-c-by.el should be enough for the C parser to be working.

I see.  I think it's better for us to merge just the generated Lisp
grammar files, leaving the grammar development for upstream.  It's an
awful lot of infrastructure to pull in, considering that CEDET
development won't be carried out in our repository anyway.

Do you know if the bovine and wisent parsers are mutually replacable?
For instance, the default parser seems to be bovine; would it be a big
deal if we included just the bovine parser?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-05 20:53   ` Chong Yidong
@ 2009-09-05 23:08     ` David Engster
  0 siblings, 0 replies; 29+ messages in thread
From: David Engster @ 2009-09-05 23:08 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:
> Do you know if the bovine and wisent parsers are mutually replacable?
> For instance, the default parser seems to be bovine; would it be a big
> deal if we included just the bovine parser?

I don't think it makes much sense to include just the Bovine parser. If
you look at

http://cedet.sourceforge.net/languagesupport.shtml

you'll see the currently supported languages in CEDET, together with
their current status regarding completion, project support etc.. The
grammar column shows the type of grammar, "LL" or "LALR". The former is
done with Bovine/Bison, the latter with Wisent. So Bison isn't really
the default, but it's the older one, and especially the C/C++ support is
pretty stable by now (there's also a Wisent parser for C, but it doesn't
support C++ and AFAIK is currently not used). Some of the Wisent
grammars are in the contrib directory, which probably means they
basically work, but lack further infrastructure in Semantic.

But I think the Wisent grammars work pretty much the same as the Bison
ones, i.e., during CEDET's compilation a file 'wisent-<LANG>-wy.el' file
is created, which contains the actual parser.

-David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-05 16:28 CEDET merge question Chong Yidong
  2009-09-05 17:22 ` David Engster
@ 2009-09-06 15:37 ` Richard Stallman
  2009-09-06 17:46   ` Ken Raeburn
  2009-09-08  8:11 ` joakim
  2 siblings, 1 reply; 29+ messages in thread
From: Richard Stallman @ 2009-09-06 15:37 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

    I ask because the CEDET merge already involves an uncomfortably large
    amount of code, and it's rather dismaying to see these two big code
    trees "embedded" in subdirectories of Semantic.  (Wisent, for instance,
    appears to be an entire Elisp reimplementation of Bison...)

Is it possible to use Bison itself rather than implement the
same functionality differently?  Or perhaps add an option
to Bison to output its data in whatever format is convenient?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-06 15:37 ` Richard Stallman
@ 2009-09-06 17:46   ` Ken Raeburn
  2009-09-06 21:11     ` David Engster
  2009-09-07 13:34     ` Richard Stallman
  0 siblings, 2 replies; 29+ messages in thread
From: Ken Raeburn @ 2009-09-06 17:46 UTC (permalink / raw)
  To: rms; +Cc: Chong Yidong, emacs-devel

On Sep 6, 2009, at 11:37, Richard Stallman wrote:
>    I ask because the CEDET merge already involves an uncomfortably  
> large
>    amount of code, and it's rather dismaying to see these two big code
>    trees "embedded" in subdirectories of Semantic.  (Wisent, for  
> instance,
>    appears to be an entire Elisp reimplementation of Bison...)
>
> Is it possible to use Bison itself rather than implement the
> same functionality differently?  Or perhaps add an option
> to Bison to output its data in whatever format is convenient?

Guile is also using a translation/reimplementation of Bison in  
Scheme.  I haven't looked at the CEDET code, but Guile's version wants  
the grammar input using Scheme (s-expression) syntax.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-06 17:46   ` Ken Raeburn
@ 2009-09-06 21:11     ` David Engster
  2009-09-06 22:26       ` Ken Raeburn
  2009-09-07 13:33       ` Richard Stallman
  2009-09-07 13:34     ` Richard Stallman
  1 sibling, 2 replies; 29+ messages in thread
From: David Engster @ 2009-09-06 21:11 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Chong Yidong, rms, emacs-devel

Ken Raeburn <raeburn@raeburn.org> writes:
> On Sep 6, 2009, at 11:37, Richard Stallman wrote:
>> Is it possible to use Bison itself rather than implement the
>> same functionality differently?  Or perhaps add an option
>> to Bison to output its data in whatever format is convenient?
>
> Guile is also using a translation/reimplementation of Bison in Scheme.
> I haven't looked at the CEDET code, but Guile's version wants  the
> grammar input using Scheme (s-expression) syntax.

CEDET uses Bison grammars which are extended through "Optional Lambda
Expressions" (OLE). They produce the actual tags, which are the basic
objects resulting from the parsing stage. I don't think this can be
easily replaced by Bison itself or Guile.

But there's really not that much additional framework associated with
Bison/Bovine. In the 'bovine' subdirectory, there are the actual grammar
files (like c.by, erlang.by, etc.), and the major- and debugging-modes
(bovine-grammar.el, bovine-debug.el). I think they are really only
needed for developing and testing grammars.

The file semantic-bovine.el contains the parsing core and is
crucial. Then, there are files which deal with language-specific issues,
for example semantic-c.el, semantic-erlang.el, semantic-java.el,
etc.. These files contain overrides and helper functions to deal with
stuff which usually differs between languages, like smart completion,
local variables, namespaces and scoping issues, special preprocessor
macros, etc. These files are only crucial for parsing the named
language.

The file semantic-gcc.el sets up stuff like system include paths for
C/C++ by looking at the local gcc installation; it's very helpful for
people using gcc.

The files semantic-skel.el and skeleton.by are just there to get people
started developing their own grammars and overrides; I think they can be
safely dropped.

As mentioned in my previous mail, the files semantic-<LANG>-by.el result
from the compilation of the *.by files and could probably just be
provided 'as is' in Emacs, without the additional grammar developing
framework (this also implies that files could not just get synced from
CEDET CVS to Emacs, but would need a compilation step in between). I
can't speak for Eric here, of course. Maybe there's some not-so-obvious
dependency, or another good reason to include the full grammar
framework.

Regards,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-06 21:11     ` David Engster
@ 2009-09-06 22:26       ` Ken Raeburn
  2009-09-07 13:33       ` Richard Stallman
  1 sibling, 0 replies; 29+ messages in thread
From: Ken Raeburn @ 2009-09-06 22:26 UTC (permalink / raw)
  To: David Engster; +Cc: Chong Yidong, rms, emacs-devel

On Sep 6, 2009, at 17:11, David Engster wrote:
> Ken Raeburn <raeburn@raeburn.org> writes:
>> On Sep 6, 2009, at 11:37, Richard Stallman wrote:
>>> Is it possible to use Bison itself rather than implement the
>>> same functionality differently?  Or perhaps add an option
>>> to Bison to output its data in whatever format is convenient?
>>
>> Guile is also using a translation/reimplementation of Bison in  
>> Scheme.
>> I haven't looked at the CEDET code, but Guile's version wants  the
>> grammar input using Scheme (s-expression) syntax.
>
> CEDET uses Bison grammars which are extended through "Optional Lambda
> Expressions" (OLE). They produce the actual tags, which are the basic
> objects resulting from the parsing stage. I don't think this can be
> easily replaced by Bison itself or Guile.

Sorry, I didn't mean to suggest replacing it with Guile, more that, if  
the requirements were similar enough, Bison extensions to support both  
CEDET and Guile might be possible.  But if you're extending the  
grammar with Lisp code, that may not be feasible....

Ken




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-06 21:11     ` David Engster
  2009-09-06 22:26       ` Ken Raeburn
@ 2009-09-07 13:33       ` Richard Stallman
  2009-09-12 12:49         ` Eric M. Ludlam
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Stallman @ 2009-09-07 13:33 UTC (permalink / raw)
  To: David Engster; +Cc: cyd, raeburn, emacs-devel

    CEDET uses Bison grammars which are extended through "Optional Lambda
    Expressions" (OLE). They produce the actual tags, which are the basic
    objects resulting from the parsing stage. I don't think this can be
    easily replaced by Bison itself or Guile.

Why is it hard to add these to Bison?
It can handle embedded C code, so why not embedded Lisp code?
It should be straightforward to make such changes.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-06 17:46   ` Ken Raeburn
  2009-09-06 21:11     ` David Engster
@ 2009-09-07 13:34     ` Richard Stallman
  1 sibling, 0 replies; 29+ messages in thread
From: Richard Stallman @ 2009-09-07 13:34 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: cyd, emacs-devel

    > Is it possible to use Bison itself rather than implement the
    > same functionality differently?  Or perhaps add an option
    > to Bison to output its data in whatever format is convenient?

    Guile is also using a translation/reimplementation of Bison in  
    Scheme.

That may be wasteful too, but it is a separate issue.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-05 16:28 CEDET merge question Chong Yidong
  2009-09-05 17:22 ` David Engster
  2009-09-06 15:37 ` Richard Stallman
@ 2009-09-08  8:11 ` joakim
  2009-09-08  9:07   ` Lennart Borgman
  2009-09-08 14:41   ` Chong Yidong
  2 siblings, 2 replies; 29+ messages in thread
From: joakim @ 2009-09-08  8:11 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Tom Tromey, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> I have a question about CEDET that hopefully someone on this list, who
> has more experience using CEDET than me, can help answer (I've been
> corresponding with Eric Ludlam, but he's gone on vacation).
>
> The Semantic parser appears to have two major "back-ends", bovine and
> wisent, which are used to generate Semantic tags.  Does anyone know how
> crucial these packages are, and whether one or the other (or both) be
> dropped or somehow trimmed down?
>
> I ask because the CEDET merge already involves an uncomfortably large
> amount of code, and it's rather dismaying to see these two big code
> trees "embedded" in subdirectories of Semantic.  (Wisent, for instance,
> appears to be an entire Elisp reimplementation of Bison...)
>

Emacs hackers would still need easy access to these tools. Maybe this is
a further case for including something like Tom Tromeys ELPA in Emacs?
If we had something like that by default it wouldnt be a big deal to
distribute tools like these to Emacs hackers.



>
> (The CVS branch I'm using for the CEDET merge is not yet suitable for
> general testing; I'll inform the list when it's ready.)
>
-- 
Joakim Verona




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08  8:11 ` joakim
@ 2009-09-08  9:07   ` Lennart Borgman
  2009-09-08  9:09     ` Lennart Borgman
  2009-09-08 14:41   ` Chong Yidong
  1 sibling, 1 reply; 29+ messages in thread
From: Lennart Borgman @ 2009-09-08  9:07 UTC (permalink / raw)
  To: joakim; +Cc: Tom Tromey, Chong Yidong, emacs-devel

On Tue, Sep 8, 2009 at 10:11 AM, <joakim@verona.se> wrote:
>
> Emacs hackers would still need easy access to these tools. Maybe this is
> a further case for including something like Tom Tromeys ELPA in Emacs?
> If we had something like that by default it wouldnt be a big deal to
> distribute tools like these to Emacs hackers.


But does ELPA have the necessary version info structure?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08  9:07   ` Lennart Borgman
@ 2009-09-08  9:09     ` Lennart Borgman
  0 siblings, 0 replies; 29+ messages in thread
From: Lennart Borgman @ 2009-09-08  9:09 UTC (permalink / raw)
  To: joakim; +Cc: Tom Tromey, Chong Yidong, emacs-devel

On Tue, Sep 8, 2009 at 11:07 AM, Lennart
Borgman<lennart.borgman@gmail.com> wrote:
> On Tue, Sep 8, 2009 at 10:11 AM, <joakim@verona.se> wrote:
>>
>> Emacs hackers would still need easy access to these tools. Maybe this is
>> a further case for including something like Tom Tromeys ELPA in Emacs?
>> If we had something like that by default it wouldnt be a big deal to
>> distribute tools like these to Emacs hackers.
>
>
> But does ELPA have the necessary version info structure?

Eh, sorry. I mean wouldn't it be better to have a tool to install
directly from the repository where this part of CEDET is? I think it
would be rather easy to right such a tool which access the web
interface of the repository just for downloading.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08  8:11 ` joakim
  2009-09-08  9:07   ` Lennart Borgman
@ 2009-09-08 14:41   ` Chong Yidong
  2009-09-08 15:10     ` joakim
  2009-09-08 21:21     ` Romain Francoise
  1 sibling, 2 replies; 29+ messages in thread
From: Chong Yidong @ 2009-09-08 14:41 UTC (permalink / raw)
  To: joakim; +Cc: Tom Tromey, emacs-devel

joakim@verona.se writes:

> Emacs hackers would still need easy access to these tools. Maybe this is
> a further case for including something like Tom Tromeys ELPA in Emacs?

Eric's still going to develop CEDET in his repository, so if you'll be
hacking on CEDET, I think you should use the version of CEDET he has
installed, instead of the version that will eventually be bundled with
Emacs.

This is a practical matter: since CEDET is such a large and complicated
package, we shouldn't make changes directly to our copy of it, apart
from those that are necessary to adapt it to Emacs' conventions and
build system (which is what I've been working on).  Instead, changes
should be applied first to Eric's repository, then merged back into our
tree.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08 14:41   ` Chong Yidong
@ 2009-09-08 15:10     ` joakim
  2009-09-08 17:18       ` Chong Yidong
  2009-09-08 21:21     ` Romain Francoise
  1 sibling, 1 reply; 29+ messages in thread
From: joakim @ 2009-09-08 15:10 UTC (permalink / raw)
  To: Chong Yidong; +Cc: Tom Tromey, emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> joakim@verona.se writes:
>
>> Emacs hackers would still need easy access to these tools. Maybe this is
>> a further case for including something like Tom Tromeys ELPA in Emacs?
>
> Eric's still going to develop CEDET in his repository, so if you'll be
> hacking on CEDET, I think you should use the version of CEDET he has
> installed, instead of the version that will eventually be bundled with
> Emacs.
>
> This is a practical matter: since CEDET is such a large and complicated
> package, we shouldn't make changes directly to our copy of it, apart
> from those that are necessary to adapt it to Emacs' conventions and
> build system (which is what I've been working on).  Instead, changes
> should be applied first to Eric's repository, then merged back into our
> tree.

I will manage. I was more thinking of newcommers to the project that
would like to contribute grammars for instance.

-- 
Joakim Verona




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08 15:10     ` joakim
@ 2009-09-08 17:18       ` Chong Yidong
  0 siblings, 0 replies; 29+ messages in thread
From: Chong Yidong @ 2009-09-08 17:18 UTC (permalink / raw)
  To: joakim; +Cc: Tom Tromey, emacs-devel

joakim@verona.se writes:

>> This is a practical matter: since CEDET is such a large and complicated
>> package, we shouldn't make changes directly to our copy of it, apart
>> from those that are necessary to adapt it to Emacs' conventions and
>> build system (which is what I've been working on).  Instead, changes
>> should be applied first to Eric's repository, then merged back into our
>> tree.
>
> I will manage. I was more thinking of newcommers to the project that
> would like to contribute grammars for instance.

I agree that it would be good to make it easier for newcomers to hack on
the parsing infrastructure.  The ELP suggestion is a good one, but I
think it's a bit ambitious to implement it in the 23.2 timeframe.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08 14:41   ` Chong Yidong
  2009-09-08 15:10     ` joakim
@ 2009-09-08 21:21     ` Romain Francoise
  2009-09-08 22:27       ` Chong Yidong
  1 sibling, 1 reply; 29+ messages in thread
From: Romain Francoise @ 2009-09-08 21:21 UTC (permalink / raw)
  To: Chong Yidong; +Cc: emacs-devel

Chong Yidong <cyd@stupidchicken.com> writes:

> This is a practical matter: since CEDET is such a large and
> complicated package, we shouldn't make changes directly to our
> copy of it, apart from those that are necessary to adapt it to
> Emacs' conventions and build system (which is what I've been
> working on).  Instead, changes should be applied first to Eric's
> repository, then merged back into our tree.

If it's so large and complicated that we can't handle it like the
rest of Emacs, is it really a good idea to merge it in?  I don't
think we have any other packages where the rule is "make the change
upstream first".  That sounds like a liability to me.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-08 21:21     ` Romain Francoise
@ 2009-09-08 22:27       ` Chong Yidong
  0 siblings, 0 replies; 29+ messages in thread
From: Chong Yidong @ 2009-09-08 22:27 UTC (permalink / raw)
  To: Romain Francoise; +Cc: emacs-devel

Romain Francoise <romain@orebokech.com> writes:

> I don't think we have any other packages where the rule is "make the
> change upstream first".  That sounds like a liability to me.

Actually, that's the situation for Org mode.  If you want to *develop*
Org mode, I would encourage you to work on the upstream version, not the
version in Emacs.

This refers to development, not bug fixes (I apologize if my prior post
caused confusion).  In the preceding discussion, Joakim was talking
about writing new Semantic grammars, i.e. development.  Bugfixes can of
course be applied to the version in the Emacs repository.  (Though
bugfixes should also be pushed upstream too, in most cases.)

In the future, we will want to integrate CEDET more deeply into Emacs.
When that time comes, we'll need a new arrangement.  But for the time
being, I'd prefer to treat CEDET more than (say) Org mode than (say)
Calendar.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-07 13:33       ` Richard Stallman
@ 2009-09-12 12:49         ` Eric M. Ludlam
  2009-09-12 13:37           ` Miles Bader
                             ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Eric M. Ludlam @ 2009-09-12 12:49 UTC (permalink / raw)
  To: rms; +Cc: cyd, raeburn, David Engster, emacs-devel

On Mon, 2009-09-07 at 09:33 -0400, Richard Stallman wrote:
> CEDET uses Bison grammars which are extended through "Optional Lambda
>     Expressions" (OLE). They produce the actual tags, which are the basic
>     objects resulting from the parsing stage. I don't think this can be
>     easily replaced by Bison itself or Guile.
> 
> Why is it hard to add these to Bison?
> It can handle embedded C code, so why not embedded Lisp code?
> It should be straightforward to make such changes.

I don't know how bison works, but I would assume that bison parses basic
C code (thus replacing $1 with some other piece of code.)  In the same
way, it would need to be taught about Emacs Lisp, Scheme, or any other
language someone might want.

Bison also outputs the code needed for traversing the generated parser
table.  When creating more than one parser in one application (ie - any
scripting language case) this would be detrimental since it is basically
the same code for every parser, which is wasteful.

That said, I do think that it is possible, and maybe even desirable to
do such a thing.  The end result, however, would involve rather extreme
changes to bison, and possibly flex if flex is also used.

As others have pointed out, there are newer parser technologies
available too such as PEG.  How much of that is fad vs fabulous, I don't
really know.  What I do know is that the CEDET tools don't care much
about the specifics of the parser.  The parser tools it does have are to
make it easy to create new parsers so Emacs can support a large number
of languages.

A very similar question to "why not make bison support Emacs Lisp
output", is "why not have gcc support tagging output".

If gcc supported a tagging output format with the details needed for
CEDET to get its job done, it could just call out to gcc instead of
parsing it in Emacs.  CEDET would then magically support a lot more
languages.

There are a huge number of tools out there trying to do what gcc does,
like ctags, etags, ectags, cscope, gnu global, doxygen, and idutils.
What's worse is that none of them work well.

Of course, an Emacs Lisp parser can do lots of other things besides
create tags.  That's just what it is currently used for.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-12 12:49         ` Eric M. Ludlam
@ 2009-09-12 13:37           ` Miles Bader
  2009-09-13 16:39             ` Richard Stallman
  2009-09-12 16:34           ` David Engster
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Miles Bader @ 2009-09-12 13:37 UTC (permalink / raw)
  To: eric; +Cc: cyd, raeburn, rms, David Engster, emacs-devel

"Eric M. Ludlam" <eric@siege-engine.com> writes:
> As others have pointed out, there are newer parser technologies
> available too such as PEG.  How much of that is fad vs fabulous, I don't
> really know.

It's only anecdotal, but my experience with Lpeg is that it's a lot more
convenient and approachable than olde-style stuff like bison/flex, and
not obviously any less powerful.

In any case, it seems clear that some thought should be given before
putting any significant effort into bison/flex.

-Miles

-- 
Fast, small, soon; pick any 2.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-12 12:49         ` Eric M. Ludlam
  2009-09-12 13:37           ` Miles Bader
@ 2009-09-12 16:34           ` David Engster
  2009-09-13 16:39           ` Richard Stallman
  2009-09-13 16:40           ` Richard Stallman
  3 siblings, 0 replies; 29+ messages in thread
From: David Engster @ 2009-09-12 16:34 UTC (permalink / raw)
  To: eric; +Cc: cyd, raeburn, rms, emacs-devel

Eric M. Ludlam <eric@siege-engine.com> writes:
> On Mon, 2009-09-07 at 09:33 -0400, Richard Stallman wrote:
>> CEDET uses Bison grammars which are extended through "Optional Lambda
>>     Expressions" (OLE). They produce the actual tags, which are the basic
>>     objects resulting from the parsing stage. I don't think this can be
>>     easily replaced by Bison itself or Guile.
>> 
>> Why is it hard to add these to Bison?
>> It can handle embedded C code, so why not embedded Lisp code?
>> It should be straightforward to make such changes.

[...]

> A very similar question to "why not make bison support Emacs Lisp
> output", is "why not have gcc support tagging output".
>
> If gcc supported a tagging output format with the details needed for
> CEDET to get its job done, it could just call out to gcc instead of
> parsing it in Emacs.  CEDET would then magically support a lot more
> languages.

Yes, I think that would be the way to go. Some time ago, I looked at a
way to add Fortran 90/95 parsing to CEDET. It seems there's no free
Bison grammar out there, but there is for example g95-xml [1], which
apparently reuses the g95 parser and produces a XML output file, which
could then be converted to Emacs Lisp tags. Also, in gfortran, there is
a debug option '-fdump-parse-tree', which seems to produce an output
almost usable by Semantic (most importantly, it's missing any source
code information like line numbers, etc.).

Similar to g95-xml, there's gcc-xml [2], which uses gcc's C++ parser to
output a XML file. But it seems its development has stalled, and it
currently can't parse templates, for example.

One problem with this approach is how the parser reacts to 'code in
progress', meaning syntactically incorrect code which is, for example,
lacking some closing statements. I think that g95-xml just aborted in
this case, which is why I never went further with this project.

-David

[1] http://sourceforge.net/projects/g95-xml/
[2] http://www.gccxml.org/HTML/Index.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-12 12:49         ` Eric M. Ludlam
  2009-09-12 13:37           ` Miles Bader
  2009-09-12 16:34           ` David Engster
@ 2009-09-13 16:39           ` Richard Stallman
  2009-09-13 17:38             ` Eric M. Ludlam
  2009-09-13 16:40           ` Richard Stallman
  3 siblings, 1 reply; 29+ messages in thread
From: Richard Stallman @ 2009-09-13 16:39 UTC (permalink / raw)
  To: eric; +Cc: cyd, raeburn, deng, emacs-devel

    A very similar question to "why not make bison support Emacs Lisp
    output", is "why not have gcc support tagging output".

Could you please explain what you mean by that?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-12 13:37           ` Miles Bader
@ 2009-09-13 16:39             ` Richard Stallman
  2009-09-14 11:22               ` tomas
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Stallman @ 2009-09-13 16:39 UTC (permalink / raw)
  To: Miles Bader; +Cc: cyd, raeburn, emacs-devel, deng, eric

What is Lpeg, and what does it do?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-12 12:49         ` Eric M. Ludlam
                             ` (2 preceding siblings ...)
  2009-09-13 16:39           ` Richard Stallman
@ 2009-09-13 16:40           ` Richard Stallman
  3 siblings, 0 replies; 29+ messages in thread
From: Richard Stallman @ 2009-09-13 16:40 UTC (permalink / raw)
  To: eric; +Cc: cyd, raeburn, deng, emacs-devel

    I don't know how bison works, but I would assume that bison parses basic
    C code (thus replacing $1 with some other piece of code.)  In the same
    way, it would need to be taught about Emacs Lisp, Scheme, or any other
    language someone might want.

Bison parses grammar definition files, which can contain segments of code.
Normally the syntax for a segment of code is {...}.

Bison generates tables for a parser, and puts the segments of code
into a function to do the parsing.  Normally that function is written
in C.

However, using a different language and different syntax is just a
superficial change.

      The end result, however, would involve rather extreme
    changes to bison, and possibly flex if flex is also used.

Oh no.  The complex parts of Bison would not be changed at all.
Only some of the parser and the output code.  These are the parts that
are easy to understand, without even minimal knowledge of parsing.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-13 16:39           ` Richard Stallman
@ 2009-09-13 17:38             ` Eric M. Ludlam
  2009-09-14 18:28               ` Richard Stallman
  0 siblings, 1 reply; 29+ messages in thread
From: Eric M. Ludlam @ 2009-09-13 17:38 UTC (permalink / raw)
  To: rms; +Cc: cyd, raeburn, deng, emacs-devel

On Sun, 2009-09-13 at 12:39 -0400, Richard Stallman wrote:
> A very similar question to "why not make bison support Emacs Lisp
>     output", is "why not have gcc support tagging output".
> 
> Could you please explain what you mean by that?

Sure.

Etags, ctags, gnu global, idutils and cscope all have parsers of some
sort that parse C and C++ code.  Some use regexp matchers.  Others have
primitive parsers.

gcc, of course, has a full language compliant parser which it uses to
compile code.  I'm not a gcc expert, but I assume that as it parses, it
keeps track of the various symbols (functions, variables, namespaces,
etc) and where they are.  (ie - debug info for gdb).

As such, it should be possible for gcc to easily output text
representing a tags file.  Etags style would be fairly simple.  The
output of exuberant ctags is more complex.  The data needed by CEDET is
more complex still, but is still a subset of everything that gcc needs
to know.

For CEDET, if gcc saw this file:

-------------
int main(int argc, char *argv[]) {
}
-------------

it would be handy (for my application) for it to output:

--------------
(("main" function
  (:arguments
   (("argc" variable (:type "int") [ 11 20 ])
    ("argv" variable (:pointer 1 :dereference 1 :type "char") 
            [ 21 34] ))
   :type "int")
  [2 38]))
---------------

though, to be honest, any text output that is very regular would be
fine.

The part that makes this imperfect is that in Emacs, a file that needs
parsing may be in the middle of an edit.  Handling these cases can be a
bit tricky for my simplified parser, and gcc doesn't have that editing
information available.

To handle this, the CEDET tools have different ways to parse files, such
as "on save", and can track when a file is unparsable and take alternate
actions when that happens.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-13 16:39             ` Richard Stallman
@ 2009-09-14 11:22               ` tomas
  2009-09-14 12:15                 ` Miles Bader
  0 siblings, 1 reply; 29+ messages in thread
From: tomas @ 2009-09-14 11:22 UTC (permalink / raw)
  To: Richard Stallman; +Cc: deng, cyd, emacs-devel, raeburn, eric, Miles Bader

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Sep 13, 2009 at 12:39:44PM -0400, Richard Stallman wrote:
> What is Lpeg, and what does it do?

PEG stands for "Parsing Expression Grammars" and it is a grammar
notation which basically represents formally a recursive descent parser.

They are said to be a bit more powerful than context free grammars and
(usually) more expressive. The most salient point for us "old-timers"
is probably that the choices are "ordered" -- this has some price, but
we get someething for that: the distinction between lexer and parser
becomes more flexible. The relevant paper seems to be [1].

It seems that they are very nice to bind to a languag.

LPEG is the implementation of PEGs to be used in Lua.

[1] <http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf>

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKridbBcgs9XrR2kYRAr7mAJ4wFQQd1aKLujMnAvlNST/TlibSUQCfTeCI
qOWOujkLZVNLsv+I8/vUlbM=
=88yG
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-14 11:22               ` tomas
@ 2009-09-14 12:15                 ` Miles Bader
  2009-09-14 20:04                   ` tomas
  0 siblings, 1 reply; 29+ messages in thread
From: Miles Bader @ 2009-09-14 12:15 UTC (permalink / raw)
  To: tomas; +Cc: Richard Stallman, deng, cyd, emacs-devel, raeburn, eric

tomas@tuxteam.de writes:
> PEG stands for "Parsing Expression Grammars" and it is a grammar
> notation which basically represents formally a recursive descent parser.
>
> They are said to be a bit more powerful than context free grammars and
> (usually) more expressive. The most salient point for us "old-timers"
> is probably that the choices are "ordered" -- this has some price, but
> we get someething for that: the distinction between lexer and parser
> becomes more flexible. The relevant paper seems to be [1].
>
> LPEG is the implementation of PEGs to be used in Lua.
>
> [1] <http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf>

Note that while LPEG is a PEG parser, it's _not_ a packrat parser (as in
[1]); the packrat algorithm is just an implementation technique.

I've appended a copy of an a message I sent a while ago to emacs-devel
on the same subject (LPEG vs. packrat).

Note that I think it's not just the implementation technique which is
interesting about LPEG, but also the very nice manner in which it's
integrated with the language and made available for use.  It's just an
amazingly powerful and handy tool.  I recommend reading the LPEG web
page, where it gives a quick overview of it.

Since Lua is in many ways feels quite similar to lisp, I think an elisp
version would be similarly very natural and powerful.  One difference
though -- for Lua, LPEG uses overloaded operators for building up
grammars; in elisp, it would probably be better to just use
s-expressions to represent grammars, using backquotes to embed
non-literal values.

[earlier message:]

You also might be interested in Roberto Ierusalimschy's paper on the
implemenation of LPEG, which is a PEG implementation for Lua:

  http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

Note that LPEG does _not_ use the packrat algorithm, as apparently it
presents some serious practical problems for common uses of parsing
tools:

     In 2002, Ford proposed Packrat [5], an adaptation of the original
  algorithm that uses lazy evaluation to avoid that inefficiency.

     Even with this improvement, however, the space complexity of the
  algorithm is still linear on the subject size (with a somewhat big
  constant), even in the best case. As its author himself recognizes,
  this makes the algorithm not befitting for parsing “large amounts of
  relatively flat” data ([5], p. 57). However, unlike parsing tools,
  regular-expression tools aim exactly at large amounts of relatively
  flat data.

     To avoid these difficulties, we did not use the Packrat algorithm
  for LPEG. To implement LPEG we created a virtual parsing machine, not
  unlike Knuth’s parsing machine [15], where each pattern is
  represented as a program for the machine. The program is somewhat
  similar to a recursive-descendent parser (with limited backtracking)
  for the pattern, but it uses an explicit stack instead of recursion.

The general LPEG page is here:

  http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

-Miles

-- 
Back, n. That part of your friend which it is your privilege to contemplate in
your adversity.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-13 17:38             ` Eric M. Ludlam
@ 2009-09-14 18:28               ` Richard Stallman
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Stallman @ 2009-09-14 18:28 UTC (permalink / raw)
  To: eric; +Cc: cyd, raeburn, deng, emacs-devel

    Etags, ctags, gnu global, idutils and cscope all have parsers of some
    sort that parse C and C++ code.  Some use regexp matchers.  Others have
    primitive parsers.

    gcc, of course, has a full language compliant parser which it uses to
    compile code.  I'm not a gcc expert, but I assume that as it parses, it
    keeps track of the various symbols (functions, variables, namespaces,
    etc) and where they are.  (ie - debug info for gdb).

Now I know what you are talking about.  This idea seems very
appealing, but it has a grave flaw.  The flaw comes from the way GCC
handles input: it does preprocessing first, and real parsing operates
only on the output of preprocessing.  So the output that GCC can
easily make would describe only the output of preprocessing.
Definitions and calls which are not actually compiled won't be seen at
all.  Macros and references to them won't be seen at all.

What etags does now is much better, because it avoids that problem.

It is true that output from GCC would give more details about types,
etc., and would avoid getting confused in a few strange situations.
So there is indeed an advantage to generating the output from GCC.
But the disadvantage is much more important.

I designed a way to make GCC analyze and report on macros and on the
code that's not compiled in.  That would get the best of both aspects.
But this is not a small job.  Please don't ask me to write more
details unless you're prepared to do a substantial amount of work
and study the GCC parsing code carefully.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: CEDET merge question
  2009-09-14 12:15                 ` Miles Bader
@ 2009-09-14 20:04                   ` tomas
  0 siblings, 0 replies; 29+ messages in thread
From: tomas @ 2009-09-14 20:04 UTC (permalink / raw)
  To: Miles Bader
  Cc: Richard Stallman, deng, cyd, emacs-devel, tomas, eric, raeburn

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Sep 14, 2009 at 09:15:02PM +0900, Miles Bader wrote:
[...]
> Note that while LPEG is a PEG parser, it's _not_ a packrat parser
> [...]

Thanks, Miles for the (as usually clearly expounded) insights.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKrqHUBcgs9XrR2kYRAhQOAJoDlVSzNGIa0TTaPK0tThYoKSW1bgCdFefi
VVADVlO9geSG8Vonhv9d/Jk=
=gAgk
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-09-14 20:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-05 16:28 CEDET merge question Chong Yidong
2009-09-05 17:22 ` David Engster
2009-09-05 20:53   ` Chong Yidong
2009-09-05 23:08     ` David Engster
2009-09-06 15:37 ` Richard Stallman
2009-09-06 17:46   ` Ken Raeburn
2009-09-06 21:11     ` David Engster
2009-09-06 22:26       ` Ken Raeburn
2009-09-07 13:33       ` Richard Stallman
2009-09-12 12:49         ` Eric M. Ludlam
2009-09-12 13:37           ` Miles Bader
2009-09-13 16:39             ` Richard Stallman
2009-09-14 11:22               ` tomas
2009-09-14 12:15                 ` Miles Bader
2009-09-14 20:04                   ` tomas
2009-09-12 16:34           ` David Engster
2009-09-13 16:39           ` Richard Stallman
2009-09-13 17:38             ` Eric M. Ludlam
2009-09-14 18:28               ` Richard Stallman
2009-09-13 16:40           ` Richard Stallman
2009-09-07 13:34     ` Richard Stallman
2009-09-08  8:11 ` joakim
2009-09-08  9:07   ` Lennart Borgman
2009-09-08  9:09     ` Lennart Borgman
2009-09-08 14:41   ` Chong Yidong
2009-09-08 15:10     ` joakim
2009-09-08 17:18       ` Chong Yidong
2009-09-08 21:21     ` Romain Francoise
2009-09-08 22:27       ` Chong Yidong

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).