all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* State-machine based syntax highlighting
@ 2006-12-07  6:14 spamfilteraccount
  2006-12-07 10:53 ` Robert Thorpe
  0 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07  6:14 UTC (permalink / raw)


I just read that in the text editor FTE does syntax highlighting can be
defined with state-machines.

Here's a LUA example I found: http://t-o-m-e.net/tmp/m_lua.fte

Does anyone know the dis/advantages of this method compared to the
regexp-based emacs approach? E.g. would it work faster than the current
emacs implementation?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07  6:14 State-machine based syntax highlighting spamfilteraccount
@ 2006-12-07 10:53 ` Robert Thorpe
  2006-12-07 11:56   ` spamfilteraccount
  0 siblings, 1 reply; 28+ messages in thread
From: Robert Thorpe @ 2006-12-07 10:53 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> I just read that in the text editor FTE does syntax highlighting can be
> defined with state-machines.
>
> Here's a LUA example I found: http://t-o-m-e.net/tmp/m_lua.fte
>
> Does anyone know the dis/advantages of this method compared to the
> regexp-based emacs approach?

Regexp are state machines.  Or, to be more precise the regexp engine
compiles regexp it is given into discrete finite state machines.
Defining state machines manually is usually worse than generating them
from regexp normally, because a human cannot do the regexp
optimizations that the regexp engine can.

In my view the real way to improve Emacs syntax highlighting would be
to make it based on parsing.

> E.g. would it work faster than the current
> emacs implementation?

Do you have a problem with the speed of a regexp you have written?  If
so it's probably down to the regexp or the way you're trying to do
things.  Post the code here and someone may be able to help you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 10:53 ` Robert Thorpe
@ 2006-12-07 11:56   ` spamfilteraccount
  2006-12-07 12:42     ` Robert Thorpe
  0 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07 11:56 UTC (permalink / raw)



Robert Thorpe wrote:
>
> In my view the real way to improve Emacs syntax highlighting would be
> to make it based on parsing.
>

Yes, it could be better, though in this case emacs would rely on
external tools doing the actual parsing, because I don't think the
syntax parsing of every language should be reimplemented in elisp.

That's not a big deal, since if I need to work with c files then
usually have the c compiler installed.

The only thing needed is a compiler for the given language which
outputs syntax information for the source file. I don't know if GCC can
do that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 11:56   ` spamfilteraccount
@ 2006-12-07 12:42     ` Robert Thorpe
  2006-12-07 14:27       ` spamfilteraccount
  0 siblings, 1 reply; 28+ messages in thread
From: Robert Thorpe @ 2006-12-07 12:42 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> Robert Thorpe wrote:
> >
> > In my view the real way to improve Emacs syntax highlighting would be
> > to make it based on parsing.
> >
>
> Yes, it could be better, though in this case emacs would rely on
> external tools doing the actual parsing, because I don't think the
> syntax parsing of every language should be reimplemented in elisp.
>
> That's not a big deal, since if I need to work with c files then
> usually have the c compiler installed.

No, it would probably have to be reimplemented inside Emacs.
There are many differences between parsing an language in order to
compile it and parsing a language in order to perform syntax
highlighting and movement commands.

In the later case you have to be able to tolerate expressions near to
point that are incorrectly formatted because the user is still typing
them.  Also you have to be able to trigger the process from any given
point, so that if the user jumps from line 1 to line 5794 you don't
have to parse everything in the intervening code.  Also, when
highlighting you don't care about the contents of the code much.

The Emacs "Semantic" package already does much of this, so do some
other editors.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 12:42     ` Robert Thorpe
@ 2006-12-07 14:27       ` spamfilteraccount
  2006-12-07 14:39         ` Robert Thorpe
  0 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07 14:27 UTC (permalink / raw)



Robert Thorpe wrote:
>
> The Emacs "Semantic" package already does much of this, so do some
> other editors.

It would make more sense to create one such parser than reimplementing
parsing in every editor...


I did a quick search and found this page


http://harmonia.cs.berkeley.edu/harmonia/projects/harmonia-mode/doc/index.html

with a demo xemacs package with syntax highlighting and stuff. Looked
interesting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 14:27       ` spamfilteraccount
@ 2006-12-07 14:39         ` Robert Thorpe
  2006-12-07 17:02           ` spamfilteraccount
  0 siblings, 1 reply; 28+ messages in thread
From: Robert Thorpe @ 2006-12-07 14:39 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> Robert Thorpe wrote:
> >
> > The Emacs "Semantic" package already does much of this, so do some
> > other editors.
>
> It would make more sense to create one such parser than reimplementing
> parsing in every editor...

In many ways it would.  But I expect it will be reimplemented in every
editor, for several reasons:-
* The insides of different editors work very differently
* External dependencies make building harder and irritate people
* Elisp is considerably nicer than many programming languages
reimplementing is not so hard
* Many people will make parsers as closed-source, or refuse to assign
copyright to the FSF
* People don't like helping other editors so they don't offer
functionality in an easily usable form
* GNU will not want to offer _parsers_ in an easily usable form,
because doing so would allow proprietery compilers to be built very
easily.

I'm not saying this is necessarily the best way, but I expect it's what
will happen.

> I did a quick search and found this page
>
>
> http://harmonia.cs.berkeley.edu/harmonia/projects/harmonia-mode/doc/index.html
>
> with a demo xemacs package with syntax highlighting and stuff. Looked
> interesting.

Interesting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 14:39         ` Robert Thorpe
@ 2006-12-07 17:02           ` spamfilteraccount
  2006-12-07 17:42             ` Stefan Monnier
       [not found]             ` <mailman.1644.1165513359.2155.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07 17:02 UTC (permalink / raw)


Robert Thorpe wrote:
>
> > I did a quick search and found this page
> >
> > http://harmonia.cs.berkeley.edu/harmonia/projects/harmonia-mode/doc/index.html
> >
> > with a demo xemacs package with syntax highlighting and stuff. Looked
> > interesting.
>
> Interesting.

I wondered why they supported xemacs only, so I downloaded the source.
Seems they wrote xemacs extensions in c which have to be compiled into
xemacs.

Not a usual way to extend an emacs, but probably advantageous from a
performance point of view.

The idea already occured to me that font locking should be implemented
in pure c in emacs for speed, but I guess it's kind of against the
extensible editor concept or something.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 17:02           ` spamfilteraccount
@ 2006-12-07 17:42             ` Stefan Monnier
       [not found]             ` <mailman.1644.1165513359.2155.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2006-12-07 17:42 UTC (permalink / raw)


>> > I did a quick search and found this page
>> > http://harmonia.cs.berkeley.edu/harmonia/projects/harmonia-mode/doc/index.html
>> > with a demo xemacs package with syntax highlighting and stuff. Looked
>> > interesting.
>> Interesting.
> I wondered why they supported xemacs only, so I downloaded the source.
> Seems they wrote xemacs extensions in c which have to be compiled into
> xemacs.

> Not a usual way to extend an emacs, but probably advantageous from a
> performance point of view.

> The idea already occured to me that font locking should be implemented
> in pure c in emacs for speed, but I guess it's kind of against the
> extensible editor concept or something.

Actually, font-locking *is* implemented in C.  The elisp part usually takes
a negligible amount of time.  The problem start appearing when the
functionality of the C code is not sufficient and you start trying to parse
the code in elisp, which is slow.


        Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
       [not found]             ` <mailman.1644.1165513359.2155.help-gnu-emacs@gnu.org>
@ 2006-12-07 18:35               ` spamfilteraccount
  2006-12-07 18:57                 ` Robert Thorpe
  2006-12-07 19:02                 ` Stefan Monnier
  0 siblings, 2 replies; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07 18:35 UTC (permalink / raw)



Stefan Monnier wrote:
>
> Actually, font-locking *is* implemented in C.  The elisp part usually takes
> a negligible amount of time.  The problem start appearing when the
> functionality of the C code is not sufficient and you start trying to parse
> the code in elisp, which is slow.

Good to know. I thought font-lock was implemented in elisp and didn't
bother to check.

BTW, I checked the situation in the enemy camp and seems they also have
problems with performance:

- The colors are wrong when scrolling bottom to top.
	Vim doesn't read the whole file to parse the text.  It starts parsing
	wherever you are viewing the file.  That saves a lot of time, but
	sometimes the colors are wrong.  A simple fix is hitting CTRL-L.  Or
	scroll back a bit and then forward again.
	For a real fix, see |:syn-sync|.  Some syntax files have a way to make
	it look further back, see the help for the specific syntax file.  For
	example, |tex.vim| for the TeX syntax.

...

Displaying text in color takes a lot of effort.  If you find the
displaying
too slow, you might want to disable syntax highlighting for a moment:

	:syntax clear

When editing another file (or the same one) the colors will come back.

http://vimdoc.sourceforge.net/htmldoc/usr_06.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 18:35               ` spamfilteraccount
@ 2006-12-07 18:57                 ` Robert Thorpe
  2006-12-07 20:24                   ` Perry Smith
                                     ` (2 more replies)
  2006-12-07 19:02                 ` Stefan Monnier
  1 sibling, 3 replies; 28+ messages in thread
From: Robert Thorpe @ 2006-12-07 18:57 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> Stefan Monnier wrote:
> >
> > Actually, font-locking *is* implemented in C.  The elisp part usually takes
> > a negligible amount of time.  The problem start appearing when the
> > functionality of the C code is not sufficient and you start trying to parse
> > the code in elisp, which is slow.
>
> Good to know. I thought font-lock was implemented in elisp and didn't
> bother to check.

Precisely speaking...
The code that determines what rules are used to font-lock text is in
Elisp.
The regexp engine that finds the things to be font-locked is in the
core of Emacs.
The colourisation is implemented in the Emacs core.

Overall this means that most of the work is in the Emacs core.

If parsing were to be used to support syntax highlighting then maybe
some work would have to be done to avoid having to use Elisp.  But I'm
not sure since it would still require loads of regexps and they would
probably still eat up a lot of the runtime.

> BTW, I checked the situation in the enemy camp and seems they also have
> problems with performance:

Almost every editor does with both large files and syntactically
complex languages.  As far as I know, Emacs is a little slower than Vim
at least in some cases.

If you want to avoid the problem then use <4000 line files and write
your programs in Lisp.  Those are good things to do anyway ;)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 18:35               ` spamfilteraccount
  2006-12-07 18:57                 ` Robert Thorpe
@ 2006-12-07 19:02                 ` Stefan Monnier
  2006-12-07 19:29                   ` spamfilteraccount
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2006-12-07 19:02 UTC (permalink / raw)


> Good to know. I thought font-lock was implemented in elisp and didn't
> bother to check.

If you look at the code you'll probably think it's implemented in elisp.
But if you look at a profile, you'll probably see that it's spending most of
its time in either text-property manipulation functions, or
regexp-matching, or parse-partial-sexp, all of which are written in C.


        Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 19:02                 ` Stefan Monnier
@ 2006-12-07 19:29                   ` spamfilteraccount
  2006-12-08 14:43                     ` Robert Thorpe
  0 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-07 19:29 UTC (permalink / raw)



Stefan Monnier wrote:
> > Good to know. I thought font-lock was implemented in elisp and didn't
> > bother to check.
>
> If you look at the code you'll probably think it's implemented in elisp.
> But if you look at a profile, you'll probably see that it's spending most of
> its time in either text-property manipulation functions, or
> regexp-matching, or parse-partial-sexp, all of which are written in C.

You wrote VIM is a little faster than Emacs. Is it because of the time
spent in the elisp part in emacs or the C part itself is implemented
more efficiently in VIM?

If it's the latter then the C implementations could be compared to see
what VIM does better.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 18:57                 ` Robert Thorpe
@ 2006-12-07 20:24                   ` Perry Smith
  2006-12-08  7:33                   ` spamfilteraccount
       [not found]                   ` <mailman.1653.1165523111.2155.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 28+ messages in thread
From: Perry Smith @ 2006-12-07 20:24 UTC (permalink / raw)
  Cc: help-gnu-emacs


[-- Attachment #1.1: Type: text/plain, Size: 1168 bytes --]


On Dec 7, 2006, at 12:57 PM, Robert Thorpe wrote:

> spamfilteraccount@gmail.com wrote:
>> Stefan Monnier wrote:
>>>
>>> Actually, font-locking *is* implemented in C.  The elisp part  
>>> usually takes
>>> a negligible amount of time.  The problem start appearing when the
>>> functionality of the C code is not sufficient and you start  
>>> trying to parse
>>> the code in elisp, which is slow.
>>
>> Good to know. I thought font-lock was implemented in elisp and didn't
>> bother to check.
>
> Precisely speaking...
> The code that determines what rules are used to font-lock text is in
> Elisp.
> The regexp engine that finds the things to be font-locked is in the
> core of Emacs.
> The colourisation is implemented in the Emacs core.

Instead of a state machine, how about a lalr parser?  It would be a fun
project to take the lalr table generation logic from bison, smash it
into emacs, along with some predefined actions and hooks back
into emacs.  The grammers could be loaded when needed.

Perry Smith ( pedz@easesoftware.com )
Ease Software, Inc. ( http://www.easesoftware.com )

Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems



[-- Attachment #1.2: Type: text/html, Size: 6796 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 18:57                 ` Robert Thorpe
  2006-12-07 20:24                   ` Perry Smith
@ 2006-12-08  7:33                   ` spamfilteraccount
  2006-12-08  8:10                     ` Tim X
       [not found]                   ` <mailman.1653.1165523111.2155.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-08  7:33 UTC (permalink / raw)



Robert Thorpe wrote:
>
> If parsing were to be used to support syntax highlighting then maybe
> some work would have to be done to avoid having to use Elisp.  But I'm
> not sure since it would still require loads of regexps and they would
> probably still eat up a lot of the runtime.
>

That may be true, but the advantage is that parsing actually
understands code, not just matches it with some regexps, so it could be
used for much more than syntax highlighting (some kind of error
checking, code completion, etc.).

I think if there are already parsers written in elisp they should be
intergrated into the official emacs distribution (e.g. in directory
lisp/parsers), so that packages can use them to understand the code
better.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  7:33                   ` spamfilteraccount
@ 2006-12-08  8:10                     ` Tim X
  2006-12-08  8:36                       ` spamfilteraccount
                                         ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Tim X @ 2006-12-08  8:10 UTC (permalink / raw)


"spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:

> Robert Thorpe wrote:
>>
>> If parsing were to be used to support syntax highlighting then maybe
>> some work would have to be done to avoid having to use Elisp.  But I'm
>> not sure since it would still require loads of regexps and they would
>> probably still eat up a lot of the runtime.
>>
>
> That may be true, but the advantage is that parsing actually
> understands code, not just matches it with some regexps, so it could be
> used for much more than syntax highlighting (some kind of error
> checking, code completion, etc.).
>
> I think if there are already parsers written in elisp they should be
> intergrated into the official emacs distribution (e.g. in directory
> lisp/parsers), so that packages can use them to understand the code
> better.
>

Have a look at 

http://cedet.sourceforge.net/

The combination of semantic and cedet is, amongst other things, aimed
at providing parse based functionality for emacs. some of this is (I
think) going to be bundled in with emacs 22. The idea is to provide a
more powerful devleopment environment that can do things like code
completion based on more than just abbrevs and dynamic completion
based on recently used keywords and regexp. 

The problem with parse based analysis is that you need an in-built
parser for all the languages that the editor is used to develop in and
this is not a trivial task. I suspect some sort of plugin architecture
that is able to use stand-alone parses for some language of interest
would probably be the way to go as it is unlikely even a small subset
of the languages devleoped within an emacs environment can have a
parser developed in elisp which is readily maintained.

Tim
-- 
tcross (at) rapttech dot com dot au

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  8:10                     ` Tim X
@ 2006-12-08  8:36                       ` spamfilteraccount
  2006-12-08 16:17                         ` Robert Thorpe
  2006-12-08 13:14                       ` Leo
                                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-08  8:36 UTC (permalink / raw)



Tim X wrote:
>
> The problem with parse based analysis is that you need an in-built
> parser for all the languages that the editor is used to develop in and
> this is not a trivial task. I suspect some sort of plugin architecture
> that is able to use stand-alone parses for some language of interest
> would probably be the way to go as it is unlikely even a small subset
> of the languages devleoped within an emacs environment can have a
> parser developed in elisp which is readily maintained.

I think too that some kind of bridge or plugin architecture is the
answer.

Lots of languages provide access to syntax trees in some form (python,
java, etc.), so it would be much simpler to use their native
implementation than reinveinting everything in elisp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
       [not found]                   ` <mailman.1653.1165523111.2155.help-gnu-emacs@gnu.org>
@ 2006-12-08 10:01                     ` Robert Thorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Robert Thorpe @ 2006-12-08 10:01 UTC (permalink / raw)


Perry Smith wrote:
> On Dec 7, 2006, at 12:57 PM, Robert Thorpe wrote:
> > spamfilteraccount@gmail.com wrote:
> >> Stefan Monnier wrote:
> >>>
> >>> Actually, font-locking *is* implemented in C.  The elisp part
> >>> usually takes
> >>> a negligible amount of time.  The problem start appearing when the
> >>> functionality of the C code is not sufficient and you start
> >>> trying to parse
> >>> the code in elisp, which is slow.
> >>
> >> Good to know. I thought font-lock was implemented in elisp and didn't
> >> bother to check.
> >
> > Precisely speaking...
> > The code that determines what rules are used to font-lock text is in
> > Elisp.
> > The regexp engine that finds the things to be font-locked is in the
> > core of Emacs.
> > The colourisation is implemented in the Emacs core.
>
> Instead of a state machine, how about a lalr parser?  It would be a fun
> project to take the lalr table generation logic from bison, smash it
> into emacs, along with some predefined actions and hooks back
> into emacs.  The grammers could be loaded when needed.

Yes.  I've thought about doing that myself, even better would be the
GLR parser system in recent versions of Bison.  It is capable of
parsing any context-free grammar.  I haven't got enough time to work on
such a thing for Emacs myself though, unfortunately.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  8:10                     ` Tim X
  2006-12-08  8:36                       ` spamfilteraccount
@ 2006-12-08 13:14                       ` Leo
  2006-12-08 14:00                       ` Robert Thorpe
                                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Leo @ 2006-12-08 13:14 UTC (permalink / raw)


On FRI, 8 DEC 2006, Tim X. wrote:

> Have a look at 
>
> http://cedet.sourceforge.net/
>

But this one is really slow.

> ... going to be bundled in with emacs 22.

And I only see speedbar bundled.

-- 
Leo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  8:10                     ` Tim X
  2006-12-08  8:36                       ` spamfilteraccount
  2006-12-08 13:14                       ` Leo
@ 2006-12-08 14:00                       ` Robert Thorpe
  2006-12-09  2:10                         ` Stefan Monnier
       [not found]                       ` <mailman.1672.1165586758.2155.help-gnu-emacs@gnu.org>
  2006-12-08 21:17                       ` spamfilteraccount
  4 siblings, 1 reply; 28+ messages in thread
From: Robert Thorpe @ 2006-12-08 14:00 UTC (permalink / raw)


Tim X wrote:
> "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:
> > Robert Thorpe wrote:
> >>
> >> If parsing were to be used to support syntax highlighting then maybe
> >> some work would have to be done to avoid having to use Elisp.  But I'm
> >> not sure since it would still require loads of regexps and they would
> >> probably still eat up a lot of the runtime.
> >>
> >
> > That may be true, but the advantage is that parsing actually
> > understands code, not just matches it with some regexps, so it could be
> > used for much more than syntax highlighting (some kind of error
> > checking, code completion, etc.).
> >
> > I think if there are already parsers written in elisp they should be
> > intergrated into the official emacs distribution (e.g. in directory
> > lisp/parsers), so that packages can use them to understand the code
> > better.
> >
>
> Have a look at
>
> http://cedet.sourceforge.net/
>
> The combination of semantic and cedet is, amongst other things, aimed
> at providing parse based functionality for emacs. some of this is (I
> think) going to be bundled in with emacs 22. The idea is to provide a
> more powerful devleopment environment that can do things like code
> completion based on more than just abbrevs and dynamic completion
> based on recently used keywords and regexp.
>
> The problem with parse based analysis is that you need an in-built
> parser for all the languages that the editor is used to develop in and
> this is not a trivial task. I suspect some sort of plugin architecture
> that is able to use stand-alone parses for some language of interest
> would probably be the way to go as it is unlikely even a small subset
> of the languages devleoped within an emacs environment can have a
> parser developed in elisp which is readily maintained.

I think that would be a very difficult approach.  If Emacs wants to
keep it's portability then interfacing it with other programs is
difficult.  It's not as though the language parsers in compilers etc
could be reused anyway, they are inappropriate.

As far as I can see implementing a data-driven GLR parser into Emacs is
the way to go.  That way the parser could interface directly with the
buffer.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
       [not found]                       ` <mailman.1672.1165586758.2155.help-gnu-emacs@gnu.org>
@ 2006-12-08 14:17                         ` Robert Thorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Robert Thorpe @ 2006-12-08 14:17 UTC (permalink / raw)


Leo wrote:
> On FRI, 8 DEC 2006, Tim X. wrote:
>
> > Have a look at
> >
> > http://cedet.sourceforge.net/
> >
>
> But this one is really slow.
>
> > ... going to be bundled in with emacs 22.
>
> And I only see speedbar bundled.

This isn't new, Speedbar has been bundled since Emacs 20.3 at least.
It's quite useful for some things.  I used to use it, I will again once
I get a really big monitor, so it doesn't take too much space.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-07 19:29                   ` spamfilteraccount
@ 2006-12-08 14:43                     ` Robert Thorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Robert Thorpe @ 2006-12-08 14:43 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> Stefan Monnier wrote:
> > > Good to know. I thought font-lock was implemented in elisp and didn't
> > > bother to check.
> >
> > If you look at the code you'll probably think it's implemented in elisp.
> > But if you look at a profile, you'll probably see that it's spending most of
> > its time in either text-property manipulation functions, or
> > regexp-matching, or parse-partial-sexp, all of which are written in C.
>
> You wrote VIM is a little faster than Emacs.

No, I said that.

> Is it because of the time
> spent in the elisp part in emacs or the C part itself is implemented
> more efficiently in VIM?

I doubt it's the Elisp part since it is not normally a major component
of the runtime in font-locking.  Also, Vim has it's own simple language
for describing syntax highlighting.

> If it's the latter then the C implementations could be compared to see
> what VIM does better.

Yes.  There are many bits of Emacs where the performance could be
improved.

What is the problem you're seeing with performance anyway?

Generally to even see the font-locking occur I have to set up some
quite artificial situation, and the computers I use aren't that modern
or fast.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  8:36                       ` spamfilteraccount
@ 2006-12-08 16:17                         ` Robert Thorpe
  2006-12-08 21:14                           ` spamfilteraccount
  2006-12-09  2:06                           ` Stefan Monnier
  0 siblings, 2 replies; 28+ messages in thread
From: Robert Thorpe @ 2006-12-08 16:17 UTC (permalink / raw)


spamfilteraccount@gmail.com wrote:
> Tim X wrote:
> >
> > The problem with parse based analysis is that you need an in-built
> > parser for all the languages that the editor is used to develop in and
> > this is not a trivial task. I suspect some sort of plugin architecture
> > that is able to use stand-alone parses for some language of interest
> > would probably be the way to go as it is unlikely even a small subset
> > of the languages devleoped within an emacs environment can have a
> > parser developed in elisp which is readily maintained.
>
> I think too that some kind of bridge or plugin architecture is the
> answer.
>
> Lots of languages provide access to syntax trees in some form (python,
> java, etc.), so it would be much simpler to use their native
> implementation than reinveinting everything in elisp.

That isn't really appropriate though.

Consider the following.  When I open a project I generally open all
files in the directory by doing something like C-x C-x project_foo/*.c
.  I also use save-places, so point appears in each file wherever I
left it last.  I think both of these are quite common ways to use
Emacs.

Doing this with normal parsing technology is difficult.  If the editor
just feeds every file into the external parser then back into the
editor then this will be a lot of work.  It would be similar to the
work of a compiler doing a full rebuild.  In fact it would be less
because parsing for font-locking involves nothing similar to compiler
optimization or code generation.  But it would still be a big task.  A
much better strategy is to start parsing at point in each file and only
parse a screenful at a time, doing this with an external parser would
be very hard.

There are other problems.  What if a part of the code is incorrect?
Imagine, in C for example, if a function were written "foo (;" on line
10.  The effect of the error would propagate down far away from where
it occurs, even line 300 might be treated wrongly.  The parser would
have to cope with this eventuality.

Also, in many languages there are bits of the meaning that depend on
the names used.  In C for example the code " (foo) (bar)" means
something different if foo is a type than it does if it's an
identifier.  The C compiler can cope with this because it tracks all
typedefs and identifiers through not only the current file but those
included in it with #include.   The only way for a font-lock system
based on a normal parser to deal with this situation would be for it to
read all the include files, which may not even be present.

Compiler parsers and font-locking/navigating code have different
intentions.  Compiler parsers must be fast when handling a whole file,
and they must generate accurate error messages.  Font-locking code must
be fast when starting at any arbitrary part of the code, and it must
tolerate incomplete information and errors.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08 16:17                         ` Robert Thorpe
@ 2006-12-08 21:14                           ` spamfilteraccount
  2006-12-09  2:08                             ` Stefan Monnier
  2006-12-09  2:06                           ` Stefan Monnier
  1 sibling, 1 reply; 28+ messages in thread
From: spamfilteraccount @ 2006-12-08 21:14 UTC (permalink / raw)



Robert Thorpe wrote:
>
> Doing this with normal parsing technology is difficult.  If the editor
> just feeds every file into the external parser then back into the
> editor then this will be a lot of work.

Yes, this is a problematic part. I was thinking about feeding only code
snippets to the external parser, but even determining what snippet
should be fed from the current source code would need some kind of
parsing, so using external parsers might not be feasible after all.

> Compiler parsers and font-locking/navigating code have different
> intentions.  Compiler parsers must be fast when handling a whole file,
> and they must generate accurate error messages.  Font-locking code must
> be fast when starting at any arbitrary part of the code, and it must
> tolerate incomplete information and errors.

Of course, and I wasn't thinking of using the existing compiler as is,
rather utilizing somehow the existing infrastructure in the compiler if
it's accessible to implement partial parsing. But given the problems
discussed above it may not be the way to go.

If parsing needs to be implemented in the editor and every editor must
have it's own implementation then at least the concepts could be
shared.

I mean there should be a wiki or something about discussing issues of
partial parsing for a particular language (java, c++, etc.), instead of
everyone reinventing the wheel differently.

For example, one could check the current implementation in Eclipse of
java code completion and parsing, before embarking to implement the
same thing again and the same goes for other open source editors
supporting other languages.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08  8:10                     ` Tim X
                                         ` (3 preceding siblings ...)
       [not found]                       ` <mailman.1672.1165586758.2155.help-gnu-emacs@gnu.org>
@ 2006-12-08 21:17                       ` spamfilteraccount
  4 siblings, 0 replies; 28+ messages in thread
From: spamfilteraccount @ 2006-12-08 21:17 UTC (permalink / raw)



Tim X wrote:
> "spamfilteraccount@gmail.com" <spamfilteraccount@gmail.com> writes:
>
> Have a look at
>
> http://cedet.sourceforge.net/
> 

The last release date is June 2005. Is cedet dead?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08 16:17                         ` Robert Thorpe
  2006-12-08 21:14                           ` spamfilteraccount
@ 2006-12-09  2:06                           ` Stefan Monnier
  2006-12-09  3:24                             ` Lennart Borgman
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2006-12-09  2:06 UTC (permalink / raw)


> A much better strategy is to start parsing at point in each file and only
> parse a screenful at a time, doing this with an external parser would be
> very hard.

Starting the parse "at point" can be terribly difficult since you don't know
the state of the parser at that point.  You can infer the state by parsing
backward (this is what the indentation code does typically), or by jumping
to some previous spot assumed to have some known parsing state and then
parse forward.

If all you care about is indentation, then parsing backward gives the best
results in terms of being robust in the face of partially incorrect code
(because it only parses just as far back as necessary to determine
indentation, so the resulting indentation behavior has some kind of
locality quality to it).

If you really need the full state because you're going to keep parsing
forward some arbitrary distance, then you're better off jumping back to
a "safe spot" and parsing forward from there.  In some languages it's not so
easy to figure out what are such safe spots other than the beginning of
the file.  But maybe with enough parse-state caching (à la syntax-ppss in
Emacs-22, although beefed up to keep the state of a real parser), and with
a fast enough forward parsing code, you can get away with always parsing
"from the beginning of the buffer", although it then suffers from problems
when faced with invalid/misunderstood source code.

Of note: parsing backward can be fiendishly difficult because languages are
designed without paying any attention to it.

> There are other problems.  What if a part of the code is incorrect?
> Imagine, in C for example, if a function were written "foo (;" on line
> 10.  The effect of the error would propagate down far away from where
> it occurs, even line 300 might be treated wrongly.  The parser would
> have to cope with this eventuality.

What if it's correct, but only after passing through some special
purpose preprocessor?

> Also, in many languages there are bits of the meaning that depend on
> the names used.  In C for example the code " (foo) (bar)" means
> something different if foo is a type than it does if it's an
> identifier.  The C compiler can cope with this because it tracks all
> typedefs and identifiers through not only the current file but those
> included in it with #include.   The only way for a font-lock system
> based on a normal parser to deal with this situation would be for it to
> read all the include files, which may not even be present.

> Compiler parsers and font-locking/navigating code have different
> intentions.  Compiler parsers must be fast when handling a whole file,
> and they must generate accurate error messages.  Font-locking code must
> be fast when starting at any arbitrary part of the code, and it must
> tolerate incomplete information and errors.

Note that this last point can be seen as an advantage: we don't have to
detect invalid code.

I believe there are two alternatives: one is the way taken by things like
Visual Haskell where you integrate the editor and the build system, so the
editor can run the parser with the exact same args as the compiler
would/will.  I think this is a very workable solution and I hope it will be
developped in Emacs as well.

If you decide not to integrate the editor so closely with the build system
(i.e. follow the way Emacs currently works), then you really can't reliably
parse the buffer and thus can't reuse existing parsers.  So you end up
having to design a new parser for every language.  For a real programming
language, writing its grammar is a non-trivial task, so it can be
a problem.  The only thing that would save us is that we can be as
permissive as we want: we don't have to reject invalid programs.  Better: we
can presume that the code is valid, and if it isn't we can anything we
please.  I hope Emacs will also develop in this direction.  One good step in
that direction would be to spice up syntax-table so that the basic syntactic
elements can be bigger than a single-char (e.g. so as to handle
begin...end).  I've recently experimented with the use of an
infix-precedence system where each infix "operator" can have a different
left and right precedence, and then to try and use that to parse things like
if/then/else (where `then' and `else' are seen as "infix" operators).
It doesn't seem quite powerful enough for what I want (to indent Coq code
in this case), but sufficiently close that maybe some minor extension will
get me there.


        Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08 21:14                           ` spamfilteraccount
@ 2006-12-09  2:08                             ` Stefan Monnier
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2006-12-09  2:08 UTC (permalink / raw)


> I mean there should be a wiki or something about discussing issues of
> partial parsing for a particular language (java, c++, etc.), instead of
> everyone reinventing the wheel differently.

Indeed.  The only such info I know of is a paper about the implementation of
Visual Haskell.  I'd be interested to hear about the techniques
used elsewhere.


        Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-08 14:00                       ` Robert Thorpe
@ 2006-12-09  2:10                         ` Stefan Monnier
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2006-12-09  2:10 UTC (permalink / raw)


> As far as I can see implementing a data-driven GLR parser into Emacs is
> the way to go.  That way the parser could interface directly with the
> buffer.

You may be right.  After all, a GLR grammar for a language can be easily
turned into a GLR grammar for the reversed language, so it could also be
used for backward parsing, which I find to be important for indentation.


        Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: State-machine based syntax highlighting
  2006-12-09  2:06                           ` Stefan Monnier
@ 2006-12-09  3:24                             ` Lennart Borgman
  0 siblings, 0 replies; 28+ messages in thread
From: Lennart Borgman @ 2006-12-09  3:24 UTC (permalink / raw)
  Cc: help-gnu-emacs

Stefan Monnier wrote:
>> A much better strategy is to start parsing at point in each file and only
>> parse a screenful at a time, doing this with an external parser would be
>> very hard.
>>     

I know nearly nothing about those things, but I have noticed that 
nxml-mode uses something I think was called an rng parser. It catches 
syntax errors in xml files as you type.

Is that something useful?

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-12-09  3:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-07  6:14 State-machine based syntax highlighting spamfilteraccount
2006-12-07 10:53 ` Robert Thorpe
2006-12-07 11:56   ` spamfilteraccount
2006-12-07 12:42     ` Robert Thorpe
2006-12-07 14:27       ` spamfilteraccount
2006-12-07 14:39         ` Robert Thorpe
2006-12-07 17:02           ` spamfilteraccount
2006-12-07 17:42             ` Stefan Monnier
     [not found]             ` <mailman.1644.1165513359.2155.help-gnu-emacs@gnu.org>
2006-12-07 18:35               ` spamfilteraccount
2006-12-07 18:57                 ` Robert Thorpe
2006-12-07 20:24                   ` Perry Smith
2006-12-08  7:33                   ` spamfilteraccount
2006-12-08  8:10                     ` Tim X
2006-12-08  8:36                       ` spamfilteraccount
2006-12-08 16:17                         ` Robert Thorpe
2006-12-08 21:14                           ` spamfilteraccount
2006-12-09  2:08                             ` Stefan Monnier
2006-12-09  2:06                           ` Stefan Monnier
2006-12-09  3:24                             ` Lennart Borgman
2006-12-08 13:14                       ` Leo
2006-12-08 14:00                       ` Robert Thorpe
2006-12-09  2:10                         ` Stefan Monnier
     [not found]                       ` <mailman.1672.1165586758.2155.help-gnu-emacs@gnu.org>
2006-12-08 14:17                         ` Robert Thorpe
2006-12-08 21:17                       ` spamfilteraccount
     [not found]                   ` <mailman.1653.1165523111.2155.help-gnu-emacs@gnu.org>
2006-12-08 10:01                     ` Robert Thorpe
2006-12-07 19:02                 ` Stefan Monnier
2006-12-07 19:29                   ` spamfilteraccount
2006-12-08 14:43                     ` Robert Thorpe

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.