From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Steve Yegge <stevey@google.com>
Newsgroups: gmane.emacs.devel
Subject: Re: "Font-lock is limited to text matching" is a myth
Date: Mon, 10 Aug 2009 23:47:37 -0700
Message-ID: <9c768dc60908102347v57bdf38ara9fe2179f68c07e4@mail.gmail.com>
References: <7b501d5c0908091634ndfba631vd9db6502db301097@mail.gmail.com>
	<buofxc05p1l.fsf@dhlpc061.dev.necel.com>
	<aa6b5cbe0908092251u670fbd3bg2fc4c14857d32c17@mail.gmail.com>
	<200908101335.24002.danc@merrillprint.com>
	<e01d8a50908101104i5081852bh6ecc7d900d87d19e@mail.gmail.com>
	<87my67s8mr.fsf@randomsample.de>
	<e01d8a50908101351l1af03242o84513de67eaf46b2@mail.gmail.com>
	<1249942011.29022.15.camel@projectile.siege-engine.com>
	<e01d8a50908101519k75883081h1f8332b7807b7f49@mail.gmail.com>
	<1249955428.29022.186.camel@projectile.siege-engine.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001636164225fb42000470d81398
X-Trace: ger.gmane.org 1249973333 2301 80.91.229.12 (11 Aug 2009 06:48:53 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 11 Aug 2009 06:48:53 +0000 (UTC)
Cc: Daniel Colascione <danc@merrillpress.com>,
	David Engster <deng@randomsample.de>,
	Daniel Colascione <danc@merrillprint.com>,
	Lennart Borgman <lennart.borgman@gmail.com>,
	Deniz Dogan <deniz.a.m.dogan@gmail.com>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	Leo <sdl.web@gmail.com>, Miles Bader <miles@gnu.org>
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 11 08:48:44 2009
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1Mal9c-0004m7-UF
	for ged-emacs-devel@m.gmane.org; Tue, 11 Aug 2009 08:48:43 +0200
Original-Received: from localhost ([127.0.0.1]:60858 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Mal9b-0006e1-70
	for ged-emacs-devel@m.gmane.org; Tue, 11 Aug 2009 02:48:15 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Mal9T-0006ds-B2
	for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:07 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Mal9O-0006d2-1C
	for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:06 -0400
Original-Received: from [199.232.76.173] (port=54630 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Mal9N-0006cz-Rh
	for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:01 -0400
Original-Received: from mx20.gnu.org ([199.232.41.8]:45949)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <stevey@google.com>)
	id 1Mal99-0000vZ-HM; Tue, 11 Aug 2009 02:47:48 -0400
Original-Received: from smtp-out.google.com ([216.239.33.17])
	by mx20.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stevey@google.com>)
	id 1Mal95-0004DA-V1; Tue, 11 Aug 2009 02:47:45 -0400
Original-Received: from wpaz24.hot.corp.google.com (wpaz24.hot.corp.google.com
	[172.24.198.88]) by smtp-out.google.com with ESMTP id n7B6leji008460;
	Tue, 11 Aug 2009 07:47:40 +0100
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta;
	t=1249973261; bh=mLruF0oMtAdp63OMI73FR0yXu34=;
	h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date:
	Message-ID:Subject:From:To:Cc:Content-Type:X-System-Of-Record; b=I
	/LV0T+1FjApJAjbZhuqQjvUR3xrZrAUGyxHX8ZDBmJQXKtgOs5Cne9PZKddDTi6IVxm
	nt760fwLQ4xI8FgBlA==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to:
	cc:content-type:x-system-of-record;
	b=mnBcVtiwRY9x2JJ4VVqttRmxQlPx4EkqJ1kStMANZsB3L6bENdV36RALvGNra08s7
	CpL0YrLuQAU8AwgvC+CCA==
Original-Received: from ywh37 (ywh37.prod.google.com [10.192.8.37])
	by wpaz24.hot.corp.google.com with ESMTP id n7B6lbuH026805;
	Mon, 10 Aug 2009 23:47:38 -0700
Original-Received: by ywh37 with SMTP id 37so5414812ywh.28
	for <multiple recipients>; Mon, 10 Aug 2009 23:47:37 -0700 (PDT)
Original-Received: by 10.90.34.10 with SMTP id h10mr4807565agh.96.1249973257322; Mon, 
	10 Aug 2009 23:47:37 -0700 (PDT)
In-Reply-To: <1249955428.29022.186.camel@projectile.siege-engine.com>
X-System-Of-Record: true
X-Detected-Operating-System: by mx20.gnu.org: GNU/Linux 2.6 (newer, 3)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:114040
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/114040>

--001636164225fb42000470d81398
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hello all,

Thanks for opening this can of, er, threads.  I was going to ask about
these things myself soon in any case, because it's clear that js2-mode
is not doing a very effective job of surfacing its rich information in
Emacs.  This is partly my fault, but it is also partly due to some
issues with font-lock that I'll describe in nauseating detail.

There are several important ideas being conflated in this thread that
I think need to be teased apart before we can talk responsibly about
any of them.  I've called out the top five conflations in sections
below delimited by roman numerals.

This is all in some sense an elaboration of what Eric Ludlam just
posted, to which I can only add my miserable +1.

Stephen Eilert wrote:
> I do not think that was done without a very good reason (and there's
  a lengthy post explaining it), unless the author is a complete
  masochist.

I don't think of myself that way.  Here, as requested, is a lengthy post
explaining my approach.  For the record, it could have been much
lengthier, and I have lengthy replies ready for all your objections and
concerns.  (Just in case you were wondering.)

I really do want to get this resolved, though.

I. Asynchronous parsing

js2-mode performs both syntactic and (some) semantic analysis.  It
knows, for instance, when you're using a symbol that's not defined in
its file.  js2-mode does not currently understand project structure,
but I'm doing some work in this area, and it may at some point gather
semantic information collected from several files.

Because this analysis requires parsing the entire file at least once
(see my discussion of partial/incremental parsing below), and it may
someday involve looking at symbol tables from other files, it seemed
best to run the parse asynchronously, so as not to interfere with the
user's editing.

One byproduct of having an accurate parser and symbol table is that
you can obtain style runs with relatively small effort, so js2-mode
does its own highlighting.  The downside is that this highlighting
information is unavailable at font-lock time, and it is not available
piecewise -- it's all-or-nothing.

There is a relatively simple alternative that might appease Daniel:
I could have js2-mode simply not do any highlighting by default,
except for errors and warnings.  We'd use whatever highlighting is
provided by espresso-mode, and users would be able to choose between
espresso-highlighting and js2-mode highlighting.  With the former,
they'd get "instantaneous" font-locking, albeit not as rich as what
js2-mode can provide.

This would be trivial to change.  I am actively maintaining js2-mode,
and the only reason I haven't checked in any changes since my initial
commit to the trunk is inexperience:  I'm trying to get a handle on how
many changes people tend to aggregate before checking in a change to
any given mode.  But I have several fixes (including some patches
contributed from users) that are ready to commit, and more on the way.

Errors and warnings would still need to be asynchronous (if they're
enabled).  So, too, would the imenu outline and my in-progress
buffer-based outline, which is somewhat nicer than the IMenu one.

But I think the main objection to js2-mode revolves around its
highlighting, correct?  If so, AND if we can solve the font-lock
integration issues, AND if we can fix the multi-mode issues (II
below), then I'm hopeful that js2-mode might become a reasonable
choice as the default editing mode for JavaScript.

I think espresso-mode is a fine fallback position.  Anything but
java-mode!  The default today is java-mode, and I had no qualms about
replacing it as the default for JavaScript.

Note: diagnostic messages in js2-mode are highlighted using overlays.
I tried using overlays for all highlighting but it was unacceptably
slow and had a tendency to crash Emacs.  But there are usually not
prohibitively many errors and warnings, since the error-recovery
algorithm is somewhat coarse-grained.  So error-reporting works
independently of font-lock.

II. Multi-mode support

JavaScript is especially needful of mumamo (or equivalent) multi-mode
support, because much of the JavaScript in the wild is embedded in
HTML, in template files, even in strings in other languages.

js2-mode does not support mumamo (or mmm-mode, which which I am
currently more familiar) because js2-mode's lexer needs to support
ignoring parts of the buffer.  I do not think this would be very
hard to implement, but I have not done it yet.

If I don't get to it before the next version of Emacs launches, then I
think this should effectively disqualify js2-mode from being the
default JavaScript mode.  It would be an inconsistent user experience
to have one JavaScript mode in .js files and another mode for
JavaScript inside multi-mode-enabled files.

I'm ready to give it a try, though, and I'll ping Lennart offline about
integrating the two somehow.

III. Incremental and partial parsing

Lennart and others have asked whether it is possible for js2-mode to
support partial or incremental parsing.  The short answer is
"incremental: yes; partial: no".

nxml-mode, last I checked, does incremental parsing.  It parses ahead
in the buffer, but then stops and saves its state.  If you jump forward
in the buffer, it resumes and continues the parse until some point
beyond the section you're viewing.

js2-mode could do it this way without much additional effort.  I chose
not to because once you've decided to use background parsing, it
doesn't seem like an especially useful optimization.  But I could see
it being helpful in some cases, such as when you're editing near the
top of a large file -- as long as the whole file isn't encased in some
top-level expression, which unfortunately is often the case in JS.

Partial parsing is a different beast entirely.  The goal of a partial
parser is to re-parse the minimum amount necessary, given some region
that has changed.  I've dug into this a bit, because originally I
wanted to support it in js2-mode.  I even made some progress on an
implementation.

While a few production parsers (for Java and JavaScript) have
implemented partial parsing, the vast majority of them do not support
it -- instead, they re-parse from the top.  They do this because the
incremental benefit of partial parsing is debatable, assuming you're
time- and resource-constrained, as most of us are.

I took a close look at Eclipse and IntelliJ, and even asked some
of their users to characterize the highlighting behavior of the IDE.
Without exception, the IDE users had internalized a ~1000 ms delay
in highlighting and error reporting as part of their normal workflow,
and they uniformly described it as "instant{aneous}" until I made
them time it.

I've been an Emacs user for 20+ years now, and like many I found
the idea of a parsing delay to be somewhere between "undesirable"
and "sickening".  But the majority of programmers today have
apparently learned not to notice delays of ~1sec as long as it
never interferes with their typing or indentation (see IV below).

So after looking at my ~8000 lines of elisp devoted to parsing
JavaScript, I weighed it and decided not to support partial parsing.
It's certainly possible to support it, but I think my time would be
better spent on things that average users are more likely to notice.

YMMV, of course.

The upshot is that if I'm going to support mumamo, it will need
to work within js2-mode's existing full-reparse framework.  I can
think of various ways to make it work, though, and as I mentioned
I'll talk to Lennart about it.

IV.  Indentation

The indentation in js2-mode is broken.  I'll be the first to say it.

It is based on the indentation in Karl Langstrom's mode, which does a
better job for JavaScript than any indenter based on cc-engine, but
that doesn't mean it's a good job.  And it's essentially unconfigurable.

espresso-mode shares this problem, which means that for this
important use case it is not an improvement over js2-mode.

Daniel's objections to js2-mode's non-interaction with font-lock
apply equally to the non-interaction with cc-engine's indentation
configuration system.  The indent configuration for JavaScript should
share as many settings as practical with cc-mode.

I actually made a serious attempt to generate the `c-style-alist'
data structure for js2-mode using the parse tree, but ran into three
issues:

  1) it's much harder than I thought it would be, even with a full
     parse tree available.  I had some 2000 lines of elisp invested
     in it when I pooped out, to be perfectly frank.

  2) `c-style-alist' (like font-lock) does not have enough semantic
      variables to encompass the range of indentation contexts that
      JavaScript programmers care about.  I think we'd need to add
      5-10 more, although it's been 18 months since I looked into it.

  3) indentation in "normal" Emacs modes also runs synchronously as
     the user types.  Waiting 500-800 msec or more for the parse to
     finish is (I think) not acceptable for indentation.  For small
     files the parse time is acceptable, but it would not be generally
     scalable.

#3 is the reason I gave up on #1.  It didn't seem to be worth the
effort to produce an accurate but slow indenter.

I don't know exactly how to solve this problem.  I have lots of
ideas, but it appears there are few low-hanging fruit in this space.

V. Font Lock framework design problems

There seems to be a common misconception flitting about to the
effect that font-lock is perfect and will never need to change.

This is a somewhat paradoxical viewpoint in view of the corpses
littering the path to jit-lock, which include font-lock, fast-lock,
lazy-lock, and vapor-lock.  Each decade we've had a cadre of people
claiming that *-lock meets everyone's needs, and then it gets rewritten
anyway.

So it's hard to understand how it remains such a popular viewpoint.

I'll make yet another attempt to dispel it, since once we're past the
emotional stumbling blocks, font-lock may be able to evolve again.

Va) Inadequate/insufficient style names

There are not enough font-lock faces to represent all the semantic
style runs that are identifiable to "real" language analyzers.
js2-mode makes several semantic distinctions not available in most
Emacs modes, although such distinctions are available in JDEE and
other Cedet-enabled modes, so js2-mode is by no means alone in its
needs.

In addition to the autoloaded font-lock faces, which js2-mode uses
whenever possible, js2-mode defines several new faces, including:

  * function parameters
  * "class" instance members (in JS, prototype and instance props)
  * local variables
  * undeclared variables
  * private members (although I implemented it poorly -- see below)
  * html/xml tags, attr names and delimiters -- used both for html
    in jsdoc comments and for E4X literals
  * doc tags such as those typically found in javadoc/jsdoc comments
  * warnings, errors, and informational diagnostics

I do not expect that this set is all-inclusive -- over time as js2-mode
and similar modes get smarter, they will be able to make other
semantic distinctions that users may wish to customize independently.
Given that Emacs is the most configurable editor on the planet, I do
not see any reason to entertain arguments to the contrary.

Vb) Ad-hoc default faces that are not being autoloaded

There are some modes (e.g. sgml-mode, html-mode, nxml-mode) that
define their own versions of some of the xml/html faces, but it did
not seem right to make js2-mode 'require one of these modes just to
get at ad-hoc "standard" definitions for these faces.

We should define standard faces for xml/html tags and entities, and
for any other faces that are effectively defined by 2 or more modes.

Vc) Additional semantic styles not needed by JavaScript

I have other language modes in progress, and together they define an
ever larger set of semantic styles.  The set of available font-lock
names should try to encompass the _union_ of the needs of most
languages, not the intersection.  There should, for instance, be a
font-lock-symbol-face for languages with distinguished symbols such
as Lisp, Scheme and Ruby.

I think this is relatively easy to fix, provided a little thought
goes into choosing the new faces.  Vd and Ve below should help
clarify why it requires greater than zero thought.

Vd) Composable semantic styles

Some font-lock faces represent "primary" semantic roles, in a vague
way.  For instance, there is a font-lock-function-name-face, and
this is different from font-lock-variable-name-face.  While in some
languages (including JavaScript) the distinction is not necessarily
exact, they can usually be reconciled -- e.g. being a function is
a more "important" property of an identifier than being a variable.

Most of the font-lock faces represent very common primary roles:
strings, comments, keywords, types, preprocessor macros.  But not all.
font-lock-constant face is actually orthogonal to the primary role.
A class or method or parameter can be const or non-const in some
languages.

The semantic notion of public/private/protected/package/friend
visibility is another example.  So is "abstract"/"pure virtual".

Emacs supports composable faces (a style run may have multiple
faces, and the attributes compose according to predefined rules),
but font-lock provides neither consistent nor adequate support for
this notion.

Ve) Ambiguous semantic styles

At least one of the face names is ambiguous -- it's not clear what
font-lock-builtin-face is actually supposed to highlight.  The result
is that different language modes use it for different kinds of
entities.  If you customize the face for one mode, you may wind up
with unsatisfying results in another mode due to the differences
in relative weighting/distribution of semantic types across languages.

As a hypothetical example, someone might enhance python-mode to
use font-lock-builtin-face to highlight True/False/None and possibly
"self", since they're not keywords but they are all handled specially
by the runtime.  (font-lock-type-face might be better for this, but
since they're not really classes, you could argue it either way).
These tokens appear relatively infrequently in Python.  If someone
else were to use it to highlight functions implemented in C in elisp,
there would be a lot more of that face appearing in elisp buffers,
and it might not be easy to choose one face that looks nice in both
situations.

Regardless of the fate of js2-mode, font-lock needs to add more
semantic faces.  By default these new faces might simply inherit face
attributes from their "syntactic parents" -- e.g. the faces for
locals, parameters, instance and static vars might all inherit the
settings for `font-lock-variable-name-face'.  But users should be
able to differentiate among them when the information is available.

Vf) No font-lock interface for setting exact style runs

I could be mistaken here -- if so, please correct me.

My limited understanding of font-lock and its main entry-point
mechanisms such as font-lock-keywords and font-lock-apply-highlight,
all of which use the MATCH-HIGHLIGHT data structure, is that they
are not quite powerful enough for my needs in their current incarnation.

This issue is independent of asynchronous parsing -- I think that
even if my parser were instantaneous, I would still have this issue.

The problem is that I need a way, in a given font-lock redisplay, to
say "highlight the region from X to Y with text properties {Z}".

This use case does not seem like it should be inordinately difficult
to support, but it does not seem to be supported today.

When I assert that it's not possible, I understand that it's
_theoretically_ possible.  Given a JavaScript file with 2500 style
runs, assuming I had that information available at font-lock time, I
could return a matcher that contains 2500 regular expressions, each
one of which is tailored to match one and exactly one region in the
buffer.

In practice, however, I am not aware of a way to do this that is
either clean or efficient.

If this simple feature were supported, I would have a great deal more
incentive to try to get my parsing to be fast enough to work within
the time constraints users expect from font-lock.

Vg) Lack of differentiation between mode- and minor-mode styles

One of the most common complaints from the thousands of users of
js2-mode, most of whom have exercised enough self-restraint to use the
term "work in progress" in preference to "abomination", is that
js2-mode has poor support for minor modes that do their work with
font-lock -- 80-column highlighters being a popular example, although
there are others.

The fundamental problem here is that the font-lock framework does not
differentiate between the mode's syntax highlighting and the keywords
installed by minor modes and by user code.  Instead, it merges them.

As far as I can tell, the officially supported mechanism for
adding additional font-lock patterns is `font-lock-add-keywords'.
This either appends or prepends the keywords to the defaults.

It might be possible to reverse-engineer it, for instance by manually
diffing the buffer's font-lock-defaults and font-lock-keywords and
trying to figure out which ones were added by participants other than
the major mode.  Even if it's possible, it's not clear that it always
works now, and would always work in the future.

For one thing, it's possible (as Daniel observes) to bypass this
mechanism and call font-lock-apply-highlight directly, which makes
the reverse-engineering even more cumbersome and fragile.

(Vf) is the reason (Vg) is a problem for js2-mode.  font-lock-defaults
does not seem to be a very satisfactory way to apply 2000-10000
precise style runs to a buffer, so I do all my own highlighting,
and it doesn't include style-run contributions from minor modes.

I've made some halfhearted attempts to hack around the problem, but
they've proven fragile.  If font-lock were to support (Vf), then I
think (Vg) should "just work".

VI.  Summary

I've called out some of the main integration issues I've encountered.
I've penned several major and minor language modes, not just js2-mode,
and I've chosen to whine here about the problems that could best be
classified as "problem themes".

I'm around, and I'm available for nontrivial work.  If group consensus
is that js2-mode isn't ready yet, I'm happy to keep hacking on it and
taking user patches and feedback until Emacs 24 rolls around.

But it would be nice to have more direct support for modes like mine.
I'm willing to do my end of it, but I'm always oversubscribed, and I've
already signed up to support mouse-enter and mouse-left text props
as part of another js2-mode-related thread.

So a little help would go a long way.

-steve

--001636164225fb42000470d81398
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div><div>Hello all,</div><div><br></div><div>Thanks for opening this can o=
f, er, threads. =A0I was going to ask about</div><div>these things myself s=
oon in any case, because it&#39;s clear that js2-mode</div><div>is not doin=
g a very effective job of surfacing its rich information in</div>
<div>Emacs. =A0This is partly my fault, but it is also partly due to some</=
div><div>issues with font-lock that I&#39;ll describe in nauseating detail.=
</div><div><br></div><div>There are several important ideas being conflated=
 in this thread that</div>
<div>I think need to be teased apart before we can talk responsibly about</=
div><div>any of them. =A0I&#39;ve called out the top five conflations in se=
ctions</div><div>below delimited by roman numerals.</div><div><br></div><di=
v>
This is all in some sense an elaboration of what Eric Ludlam just</div><div=
>posted, to which I can only add my miserable +1.</div><div><br></div><div>=
Stephen Eilert wrote:</div><div>&gt; I do not think that was done without a=
 very good reason (and there&#39;s</div>
<div>=A0=A0a lengthy post explaining it), unless the author is a complete</=
div><div>=A0=A0masochist.</div><div><br></div><div>I don&#39;t think of mys=
elf that way. =A0Here, as requested, is a lengthy post</div><div>explaining=
 my approach. =A0For the record, it could have been much</div>
<div>lengthier, and I have lengthy replies ready for all your objections an=
d</div><div>concerns. =A0(Just in case you were wondering.)</div><div><br><=
/div><div>I really do want to get this resolved, though.</div><div><br></di=
v>
<div>I. Asynchronous parsing</div><div><br></div><div>js2-mode performs bot=
h syntactic and (some) semantic analysis. =A0It</div><div>knows, for instan=
ce, when you&#39;re using a symbol that&#39;s not defined in</div><div>its =
file. =A0js2-mode does not currently understand project structure,</div>
<div>but I&#39;m doing some work in this area, and it may at some point gat=
her</div><div>semantic information collected from several files.</div><div>=
<br></div><div>Because this analysis requires parsing the entire file at le=
ast once</div>
<div>(see my discussion of partial/incremental parsing below), and it may</=
div><div>someday involve looking at symbol tables from other files, it seem=
ed</div><div>best to run the parse asynchronously, so as not to interfere w=
ith the</div>
<div>user&#39;s editing.</div><div><br></div><div>One byproduct of having a=
n accurate parser and symbol table is that</div><div>you can obtain style r=
uns with relatively small effort, so js2-mode</div><div>does its own highli=
ghting. =A0The downside is that this highlighting</div>
<div>information is unavailable at font-lock time, and it is not available<=
/div><div>piecewise -- it&#39;s all-or-nothing.</div><div><br></div><div>Th=
ere is a relatively simple alternative that might appease Daniel:</div>
<div>I could have js2-mode simply not do any highlighting by default,</div>=
<div>except for errors and warnings. =A0We&#39;d use whatever highlighting =
is</div><div>provided by espresso-mode, and users would be able to choose b=
etween</div>
<div>espresso-highlighting and js2-mode highlighting. =A0With the former,</=
div><div>they&#39;d get &quot;instantaneous&quot; font-locking, albeit not =
as rich as what</div><div>js2-mode can provide.</div><div><br></div><div>
This would be trivial to change. =A0I am actively maintaining js2-mode,</di=
v><div>and the only reason I haven&#39;t checked in any changes since my in=
itial</div><div>commit to the trunk is inexperience: =A0I&#39;m trying to g=
et a handle on how</div>
<div>many changes people tend to aggregate before checking in a change to</=
div><div>any given mode. =A0But I have several fixes (including some patche=
s</div><div>contributed from users) that are ready to commit, and more on t=
he way.</div>
<div><br></div><div>Errors and warnings would still need to be asynchronous=
 (if they&#39;re</div><div>enabled). =A0So, too, would the imenu outline an=
d my in-progress</div><div>buffer-based outline, which is somewhat nicer th=
an the IMenu one.</div>
<div><br></div><div>But I think the main objection to js2-mode revolves aro=
und its</div><div>highlighting, correct? =A0If so, AND if we can solve the =
font-lock</div><div>integration issues, AND if we can fix the multi-mode is=
sues (II</div>
<div>below), then I&#39;m hopeful that js2-mode might become a reasonable</=
div><div>choice as the default editing mode for JavaScript.</div><div><br><=
/div><div>I think espresso-mode is a fine fallback position. =A0Anything bu=
t</div>
<div>java-mode! =A0The default today is java-mode, and I had no qualms abou=
t</div><div>replacing it as the default for JavaScript.</div><div><br></div=
><div>Note: diagnostic messages in js2-mode are highlighted using overlays.=
</div>
<div>I tried using overlays for all highlighting but it was unacceptably</d=
iv><div>slow and had a tendency to crash Emacs. =A0But there are usually no=
t</div><div>prohibitively many errors and warnings, since the error-recover=
y</div>
<div>algorithm is somewhat coarse-grained. =A0So error-reporting works</div=
><div>independently of font-lock.</div><div><br></div><div>II. Multi-mode s=
upport</div><div><br></div><div>JavaScript is especially needful of mumamo =
(or equivalent) multi-mode</div>
<div>support, because much of the JavaScript in the wild is embedded in</di=
v><div>HTML, in template files, even in strings in other languages.</div><d=
iv><br></div><div>js2-mode does not support mumamo (or mmm-mode, which whic=
h I am</div>
<div>currently more familiar) because js2-mode&#39;s lexer needs to support=
</div><div>ignoring parts of the buffer. =A0I do not think this would be ve=
ry</div><div>hard to implement, but I have not done it yet.</div><div><br>
</div><div>If I don&#39;t get to it before the next version of Emacs launch=
es, then I</div><div>think this should effectively disqualify js2-mode from=
 being the</div><div>default JavaScript mode. =A0It would be an inconsisten=
t user experience</div>
<div>to have one JavaScript mode in .js files and another mode for</div><di=
v>JavaScript inside multi-mode-enabled files.</div><div><br></div><div>I=
9;m ready to give it a try, though, and I&#39;ll ping Lennart offline about=
</div>
<div>integrating the two somehow.</div><div><br></div><div>III. Incremental=
 and partial parsing</div><div><br></div><div>Lennart and others have asked=
 whether it is possible for js2-mode to</div><div>support partial or increm=
ental parsing. =A0The short answer is</div>
<div>&quot;incremental: yes; partial: no&quot;.</div><div><br></div><div>nx=
ml-mode, last I checked, does incremental parsing. =A0It parses ahead</div>=
<div>in the buffer, but then stops and saves its state. =A0If you jump forw=
ard</div>
<div>in the buffer, it resumes and continues the parse until some point</di=
v><div>beyond the section you&#39;re viewing.</div><div><br></div><div>js2-=
mode could do it this way without much additional effort. =A0I chose</div>
<div>not to because once you&#39;ve decided to use background parsing, it</=
div><div>doesn&#39;t seem like an especially useful optimization. =A0But I =
could see</div><div>it being helpful in some cases, such as when you&#39;re=
 editing near the</div>
<div>top of a large file -- as long as the whole file isn&#39;t encased in =
some</div><div>top-level expression, which unfortunately is often the case =
in JS.</div><div><br></div><div>Partial parsing is a different beast entire=
ly. =A0The goal of a partial</div>
<div>parser is to re-parse the minimum amount necessary, given some region<=
/div><div>that has changed. =A0I&#39;ve dug into this a bit, because origin=
ally I</div><div>wanted to support it in js2-mode. =A0I even made some prog=
ress on an</div>
<div>implementation.</div><div><br></div><div>While a few production parser=
s (for Java and JavaScript) have</div><div>implemented partial parsing, the=
 vast majority of them do not support</div><div>it -- instead, they re-pars=
e from the top. =A0They do this because the</div>
<div>incremental benefit of partial parsing is debatable, assuming you&#39;=
re</div><div>time- and resource-constrained, as most of us are.</div><div><=
br></div><div>I took a close look at Eclipse and IntelliJ, and even asked s=
ome</div>
<div>of their users to characterize the highlighting behavior of the IDE.</=
div><div>Without exception, the IDE users had internalized a ~1000 ms delay=
</div><div>in highlighting and error reporting as part of their normal work=
flow,</div>
<div>and they uniformly described it as &quot;instant{aneous}&quot; until I=
 made</div><div>them time it.</div><div><br></div><div>I&#39;ve been an Ema=
cs user for 20+ years now, and like many I found</div><div>the idea of a pa=
rsing delay to be somewhere between &quot;undesirable&quot;</div>
<div>and &quot;sickening&quot;. =A0But the majority of programmers today ha=
ve</div><div>apparently learned not to notice delays of ~1sec as long as it=
</div><div>never interferes with their typing or indentation (see IV below)=
.</div>
<div><br></div><div>So after looking at my ~8000 lines of elisp devoted to =
parsing</div><div>JavaScript, I weighed it and decided not to support parti=
al parsing.</div><div>It&#39;s certainly possible to support it, but I thin=
k my time would be</div>
<div>better spent on things that average users are more likely to notice.</=
div><div><br></div><div>YMMV, of course.</div><div><br></div><div>The upsho=
t is that if I&#39;m going to support mumamo, it will need</div><div>to wor=
k within js2-mode&#39;s existing full-reparse framework. =A0I can</div>
<div>think of various ways to make it work, though, and as I mentioned</div=
><div>I&#39;ll talk to Lennart about it.</div><div><br></div><div>IV. =A0In=
dentation</div><div><br></div><div>The indentation in js2-mode is broken. =
=A0I&#39;ll be the first to say it.</div>
<div><br></div><div>It is based on the indentation in Karl Langstrom&#39;s =
mode, which does a</div><div>better job for JavaScript than any indenter ba=
sed on cc-engine, but</div><div>that doesn&#39;t mean it&#39;s a good job. =
=A0And it&#39;s essentially unconfigurable.</div>
<div><br></div><div>espresso-mode shares this problem, which means that for=
 this</div><div>important use case it is not an improvement over js2-mode.<=
/div><div><br></div><div>Daniel&#39;s objections to js2-mode&#39;s non-inte=
raction with font-lock</div>
<div>apply equally to the non-interaction with cc-engine&#39;s indentation<=
/div><div>configuration system. =A0The indent configuration for JavaScript =
should</div><div>share as many settings as practical with cc-mode.</div><di=
v>
<br></div><div>I actually made a serious attempt to generate the `c-style-a=
list&#39;</div><div>data structure for js2-mode using the parse tree, but r=
an into three</div><div>issues:</div><div><br></div><div>=A0=A01) it&#39;s =
much harder than I thought it would be, even with a full</div>
<div>=A0=A0 =A0 parse tree available. =A0I had some 2000 lines of elisp inv=
ested</div><div>=A0=A0 =A0 in it when I pooped out, to be perfectly frank.<=
/div><div><br></div><div>=A0=A02) `c-style-alist&#39; (like font-lock) does=
 not have enough semantic</div>
<div>=A0=A0 =A0 =A0variables to encompass the range of indentation contexts=
 that</div><div>=A0=A0 =A0 =A0JavaScript programmers care about. =A0I think=
 we&#39;d need to add</div><div>=A0=A0 =A0 =A05-10 more, although it&#39;s =
been 18 months since I looked into it.</div>
<div><br></div><div>=A0=A03) indentation in &quot;normal&quot; Emacs modes =
also runs synchronously as</div><div>=A0=A0 =A0 the user types. =A0Waiting =
500-800 msec or more for the parse to</div><div>=A0=A0 =A0 finish is (I thi=
nk) not acceptable for indentation. =A0For small</div>
<div>=A0=A0 =A0 files the parse time is acceptable, but it would not be gen=
erally</div><div>=A0=A0 =A0 scalable.</div><div><br></div><div>#3 is the re=
ason I gave up on #1. =A0It didn&#39;t seem to be worth the</div><div>effor=
t to produce an accurate but slow indenter.</div>
<div><br></div><div>I don&#39;t know exactly how to solve this problem. =A0=
I have lots of</div><div>ideas, but it appears there are few low-hanging fr=
uit in this space.</div><div><br></div><div>V. Font Lock framework design p=
roblems</div>
<div><br></div><div>There seems to be a common misconception flitting about=
 to the</div><div>effect that font-lock is perfect and will never need to c=
hange.</div><div><br></div><div>This is a somewhat paradoxical viewpoint in=
 view of the corpses</div>
<div>littering the path to jit-lock, which include font-lock, fast-lock,</d=
iv><div>lazy-lock, and vapor-lock. =A0Each decade we&#39;ve had a cadre of =
people</div><div>claiming that *-lock meets everyone&#39;s needs, and then =
it gets rewritten</div>
<div>anyway.</div><div><br></div><div>So it&#39;s hard to understand how it=
 remains such a popular viewpoint.</div><div><br></div><div>I&#39;ll make y=
et another attempt to dispel it, since once we&#39;re past the</div><div>
emotional stumbling blocks, font-lock may be able to evolve again.</div><di=
v><br></div><div>Va) Inadequate/insufficient style names</div><div><br></di=
v><div>There are not enough font-lock faces to represent all the semantic</=
div>
<div>style runs that are identifiable to &quot;real&quot; language analyzer=
s.</div><div>js2-mode makes several semantic distinctions not available in =
most</div><div>Emacs modes, although such distinctions are available in JDE=
E and</div>
<div>other Cedet-enabled modes, so js2-mode is by no means alone in its</di=
v><div>needs.</div><div><br></div><div>In addition to the autoloaded font-l=
ock faces, which js2-mode uses</div><div>whenever possible, js2-mode define=
s several new faces, including:</div>
<div><br></div><div>=A0=A0* function parameters</div><div>=A0=A0* &quot;cla=
ss&quot; instance members (in JS, prototype and instance props)</div><div>=
=A0=A0* local variables</div><div>=A0=A0* undeclared variables</div><div>=
=A0=A0* private members (although I implemented it poorly -- see below)</di=
v>
<div>=A0=A0* html/xml tags, attr names and delimiters -- used both for html=
</div><div>=A0=A0 =A0in jsdoc comments and for E4X literals</div><div>=A0=
=A0* doc tags such as those typically found in javadoc/jsdoc comments</div>=
<div>=A0=A0* warnings, errors, and informational diagnostics</div>
<div><br></div><div>I do not expect that this set is all-inclusive -- over =
time as js2-mode</div><div>and similar modes get smarter, they will be able=
 to make other</div><div>semantic distinctions that users may wish to custo=
mize independently.</div>
<div>Given that Emacs is the most configurable editor on the planet, I do</=
div><div>not see any reason to entertain arguments to the contrary.</div><d=
iv><br></div><div>Vb) Ad-hoc default faces that are not being autoloaded</d=
iv>
<div><br></div><div>There are some modes (e.g. sgml-mode, html-mode, nxml-m=
ode) that</div><div>define their own versions of some of the xml/html faces=
, but it did</div><div>not seem right to make js2-mode &#39;require one of =
these modes just to</div>
<div>get at ad-hoc &quot;standard&quot; definitions for these faces.</div><=
div><br></div><div>We should define standard faces for xml/html tags and en=
tities, and</div><div>for any other faces that are effectively defined by 2=
 or more modes.</div>
<div><br></div><div>Vc) Additional semantic styles not needed by JavaScript=
</div><div><br></div><div>I have other language modes in progress, and toge=
ther they define an</div><div>ever larger set of semantic styles. =A0The se=
t of available font-lock</div>
<div>names should try to encompass the _union_ of the needs of most</div><d=
iv>languages, not the intersection. =A0There should, for instance, be a</di=
v><div>font-lock-symbol-face for languages with distinguished symbols such<=
/div>
<div>as Lisp, Scheme and Ruby.</div><div><br></div><div>I think this is rel=
atively easy to fix, provided a little thought</div><div>goes into choosing=
 the new faces. =A0Vd and Ve below should help</div><div>clarify why it req=
uires greater than zero thought.</div>
<div><br></div><div>Vd) Composable semantic styles</div><div><br></div><div=
>Some font-lock faces represent &quot;primary&quot; semantic roles, in a va=
gue</div><div>way. =A0For instance, there is a font-lock-function-name-face=
, and</div>
<div>this is different from font-lock-variable-name-face. =A0While in some<=
/div><div>languages (including JavaScript) the distinction is not necessari=
ly</div><div>exact, they can usually be reconciled -- e.g. being a function=
 is</div>
<div>a more &quot;important&quot; property of an identifier than being a va=
riable.</div><div><br></div><div>Most of the font-lock faces represent very=
 common primary roles:</div><div>strings, comments, keywords, types, prepro=
cessor macros. =A0But not all.</div>
<div>font-lock-constant face is actually orthogonal to the primary role.</d=
iv><div>A class or method or parameter can be const or non-const in some</d=
iv><div>languages.</div><div><br></div><div>The semantic notion of public/p=
rivate/protected/package/friend</div>
<div>visibility is another example. =A0So is &quot;abstract&quot;/&quot;pur=
e virtual&quot;.</div><div><br></div><div>Emacs supports composable faces (=
a style run may have multiple</div><div>faces, and the attributes compose a=
ccording to predefined rules),</div>
<div>but font-lock provides neither consistent nor adequate support for</di=
v><div>this notion.</div><div><br></div><div>Ve) Ambiguous semantic styles<=
/div><div><br></div><div>At least one of the face names is ambiguous -- it&=
#39;s not clear what</div>
<div>font-lock-builtin-face is actually supposed to highlight. =A0The resul=
t</div><div>is that different language modes use it for different kinds of<=
/div><div>entities. =A0If you customize the face for one mode, you may wind=
 up</div>
<div>with unsatisfying results in another mode due to the differences</div>=
<div>in relative weighting/distribution of semantic types across languages.=
</div><div><br></div><div>As a hypothetical example, someone might enhance =
python-mode to</div>
<div>use font-lock-builtin-face to highlight True/False/None and possibly</=
div><div>&quot;self&quot;, since they&#39;re not keywords but they are all =
handled specially</div><div>by the runtime. =A0(font-lock-type-face might b=
e better for this, but</div>
<div>since they&#39;re not really classes, you could argue it either way).<=
/div><div>These tokens appear relatively infrequently in Python. =A0If some=
one</div><div>else were to use it to highlight functions implemented in C i=
n elisp,</div>
<div>there would be a lot more of that face appearing in elisp buffers,</di=
v><div>and it might not be easy to choose one face that looks nice in both<=
/div><div>situations.</div><div><br></div><div>Regardless of the fate of js=
2-mode, font-lock needs to add more</div>
<div>semantic faces. =A0By default these new faces might simply inherit fac=
e</div><div>attributes from their &quot;syntactic parents&quot; -- e.g. the=
 faces for</div><div>locals, parameters, instance and static vars might all=
 inherit the</div>
<div>settings for `font-lock-variable-name-face&#39;. =A0But users should b=
e</div><div>able to differentiate among them when the information is availa=
ble.</div><div><br></div><div>Vf) No font-lock interface for setting exact =
style runs</div>
<div><br></div><div>I could be mistaken here -- if so, please correct me.</=
div><div><br></div><div>My limited understanding of font-lock and its main =
entry-point</div><div>mechanisms such as font-lock-keywords and font-lock-a=
pply-highlight,</div>
<div>all of which use the MATCH-HIGHLIGHT data structure, is that they</div=
><div>are not quite=A0powerful enough for my needs in their current incarna=
tion.</div><div><br></div><div>This issue is independent of asynchronous pa=
rsing -- I think that</div>
<div>even if my parser were instantaneous, I would still have this issue.</=
div><div><br></div><div>The problem is that I need a way, in a given font-l=
ock redisplay, to</div><div>say &quot;highlight the region from X to Y with=
 text properties {Z}&quot;.</div>
<div><br></div><div>This use case does not seem like it should be inordinat=
ely difficult</div><div>to support, but it does not seem to be supported to=
day.</div><div><br></div><div>When I assert that it&#39;s not possible, I u=
nderstand that it&#39;s</div>
<div>_theoretically_ possible. =A0Given a JavaScript file with 2500 style</=
div><div>runs, assuming I had that information available at font-lock time,=
 I</div><div>could return a matcher that contains 2500 regular expressions,=
 each</div>
<div>one of which is tailored to match one and exactly one region in the</d=
iv><div>buffer.</div><div><br></div><div>In practice, however, I am not awa=
re of a way to do this that is</div><div>either clean or efficient.</div>
<div><br></div><div>If this simple feature were supported, I would have a g=
reat deal more</div><div>incentive to try to get my parsing to be fast enou=
gh to work within</div><div>the time constraints users expect from font-loc=
k.</div>
<div><br></div><div>Vg) Lack of differentiation between mode- and minor-mod=
e styles</div><div><br></div><div>One of the most common complaints from th=
e thousands of users of</div><div>js2-mode, most of whom have exercised eno=
ugh self-restraint to use the</div>
<div>term &quot;work in progress&quot; in preference to &quot;abomination&q=
uot;, is that</div><div>js2-mode has poor support for minor modes that do t=
heir work with</div><div>font-lock -- 80-column highlighters being a popula=
r example, although</div>
<div>there are others.</div><div><br></div><div>The fundamental problem her=
e is that the font-lock framework does not</div><div>differentiate between =
the mode&#39;s syntax highlighting and the keywords</div><div>installed by =
minor modes and by user code. =A0Instead, it merges them.</div>
<div><br></div><div>As far as I can tell, the officially supported mechanis=
m for</div><div>adding additional font-lock patterns is `font-lock-add-keyw=
ords&#39;.</div><div>This either appends or prepends the keywords to the de=
faults.</div>
<div><br></div><div>It might be possible to reverse-engineer it, for instan=
ce by manually</div><div>diffing the buffer&#39;s font-lock-defaults and fo=
nt-lock-keywords and</div><div>trying to figure out which ones were added b=
y participants other than</div>
<div>the major mode. =A0Even if it&#39;s possible, it&#39;s not clear that =
it always</div><div>works now, and would always work in the future.</div><d=
iv><br></div><div>For one thing, it&#39;s possible (as Daniel observes) to =
bypass this</div>
<div>mechanism and call font-lock-apply-highlight directly, which makes</di=
v><div>the reverse-engineering even more cumbersome and fragile.</div><div>=
<br></div><div>(Vf) is the reason (Vg) is a problem for js2-mode. =A0font-l=
ock-defaults</div>
<div>does not seem to be a very satisfactory way to apply 2000-10000</div><=
div>precise style runs to a buffer, so I do all my own highlighting,</div><=
div>and it doesn&#39;t include style-run contributions from minor modes.</d=
iv>
<div><br></div><div>I&#39;ve made some halfhearted attempts to hack around =
the problem, but</div><div>they&#39;ve proven fragile. =A0If font-lock were=
 to support (Vf), then I</div><div>think (Vg) should &quot;just work&quot;.=
</div>
<div><br></div><div>VI. =A0Summary</div><div><br></div><div>I&#39;ve called=
 out some of the main integration issues I&#39;ve encountered.</div><div>I&=
#39;ve penned several major and minor language modes, not just js2-mode,</d=
iv>
<div>and I&#39;ve chosen to whine here about the problems that could best b=
e</div><div>classified as &quot;problem themes&quot;.</div><div><br></div><=
div>I&#39;m around, and I&#39;m available for nontrivial work. =A0If group =
consensus</div>
<div>is that js2-mode isn&#39;t ready yet, I&#39;m happy to keep hacking on=
 it and</div><div>taking user patches and feedback until Emacs 24 rolls aro=
und.</div><div><br></div><div>But it would be nice to have more direct supp=
ort for modes like mine.</div>
<div>I&#39;m willing to do my end of it, but I&#39;m always oversubscribed,=
 and I&#39;ve</div><div>already signed up to support mouse-enter and mouse-=
left text props</div><div>as part of another js2-mode-related thread.</div>
<div><br></div><div>So a little help would go a long way.</div><div><br></d=
iv><div>-steve</div></div>

--001636164225fb42000470d81398--