From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Steve Yegge Newsgroups: gmane.emacs.devel Subject: Re: "Font-lock is limited to text matching" is a myth Date: Mon, 10 Aug 2009 23:47:37 -0700 Message-ID: <9c768dc60908102347v57bdf38ara9fe2179f68c07e4@mail.gmail.com> References: <7b501d5c0908091634ndfba631vd9db6502db301097@mail.gmail.com> <200908101335.24002.danc@merrillprint.com> <87my67s8mr.fsf@randomsample.de> <1249942011.29022.15.camel@projectile.siege-engine.com> <1249955428.29022.186.camel@projectile.siege-engine.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001636164225fb42000470d81398 X-Trace: ger.gmane.org 1249973333 2301 80.91.229.12 (11 Aug 2009 06:48:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 11 Aug 2009 06:48:53 +0000 (UTC) Cc: Daniel Colascione , David Engster , Daniel Colascione , Lennart Borgman , Deniz Dogan , Stefan Monnier , Leo , Miles Bader To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 11 08:48:44 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Mal9c-0004m7-UF for ged-emacs-devel@m.gmane.org; Tue, 11 Aug 2009 08:48:43 +0200 Original-Received: from localhost ([127.0.0.1]:60858 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mal9b-0006e1-70 for ged-emacs-devel@m.gmane.org; Tue, 11 Aug 2009 02:48:15 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mal9T-0006ds-B2 for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:07 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mal9O-0006d2-1C for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:06 -0400 Original-Received: from [199.232.76.173] (port=54630 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mal9N-0006cz-Rh for emacs-devel@gnu.org; Tue, 11 Aug 2009 02:48:01 -0400 Original-Received: from mx20.gnu.org ([199.232.41.8]:45949) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Mal99-0000vZ-HM; Tue, 11 Aug 2009 02:47:48 -0400 Original-Received: from smtp-out.google.com ([216.239.33.17]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Mal95-0004DA-V1; Tue, 11 Aug 2009 02:47:45 -0400 Original-Received: from wpaz24.hot.corp.google.com (wpaz24.hot.corp.google.com [172.24.198.88]) by smtp-out.google.com with ESMTP id n7B6leji008460; Tue, 11 Aug 2009 07:47:40 +0100 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1249973261; bh=mLruF0oMtAdp63OMI73FR0yXu34=; h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:X-System-Of-Record; b=I /LV0T+1FjApJAjbZhuqQjvUR3xrZrAUGyxHX8ZDBmJQXKtgOs5Cne9PZKddDTi6IVxm nt760fwLQ4xI8FgBlA== DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:x-system-of-record; b=mnBcVtiwRY9x2JJ4VVqttRmxQlPx4EkqJ1kStMANZsB3L6bENdV36RALvGNra08s7 CpL0YrLuQAU8AwgvC+CCA== Original-Received: from ywh37 (ywh37.prod.google.com [10.192.8.37]) by wpaz24.hot.corp.google.com with ESMTP id n7B6lbuH026805; Mon, 10 Aug 2009 23:47:38 -0700 Original-Received: by ywh37 with SMTP id 37so5414812ywh.28 for ; Mon, 10 Aug 2009 23:47:37 -0700 (PDT) Original-Received: by 10.90.34.10 with SMTP id h10mr4807565agh.96.1249973257322; Mon, 10 Aug 2009 23:47:37 -0700 (PDT) In-Reply-To: <1249955428.29022.186.camel@projectile.siege-engine.com> X-System-Of-Record: true X-Detected-Operating-System: by mx20.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:114040 Archived-At: --001636164225fb42000470d81398 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello all, Thanks for opening this can of, er, threads. I was going to ask about these things myself soon in any case, because it's clear that js2-mode is not doing a very effective job of surfacing its rich information in Emacs. This is partly my fault, but it is also partly due to some issues with font-lock that I'll describe in nauseating detail. There are several important ideas being conflated in this thread that I think need to be teased apart before we can talk responsibly about any of them. I've called out the top five conflations in sections below delimited by roman numerals. This is all in some sense an elaboration of what Eric Ludlam just posted, to which I can only add my miserable +1. Stephen Eilert wrote: > I do not think that was done without a very good reason (and there's a lengthy post explaining it), unless the author is a complete masochist. I don't think of myself that way. Here, as requested, is a lengthy post explaining my approach. For the record, it could have been much lengthier, and I have lengthy replies ready for all your objections and concerns. (Just in case you were wondering.) I really do want to get this resolved, though. I. Asynchronous parsing js2-mode performs both syntactic and (some) semantic analysis. It knows, for instance, when you're using a symbol that's not defined in its file. js2-mode does not currently understand project structure, but I'm doing some work in this area, and it may at some point gather semantic information collected from several files. Because this analysis requires parsing the entire file at least once (see my discussion of partial/incremental parsing below), and it may someday involve looking at symbol tables from other files, it seemed best to run the parse asynchronously, so as not to interfere with the user's editing. One byproduct of having an accurate parser and symbol table is that you can obtain style runs with relatively small effort, so js2-mode does its own highlighting. The downside is that this highlighting information is unavailable at font-lock time, and it is not available piecewise -- it's all-or-nothing. There is a relatively simple alternative that might appease Daniel: I could have js2-mode simply not do any highlighting by default, except for errors and warnings. We'd use whatever highlighting is provided by espresso-mode, and users would be able to choose between espresso-highlighting and js2-mode highlighting. With the former, they'd get "instantaneous" font-locking, albeit not as rich as what js2-mode can provide. This would be trivial to change. I am actively maintaining js2-mode, and the only reason I haven't checked in any changes since my initial commit to the trunk is inexperience: I'm trying to get a handle on how many changes people tend to aggregate before checking in a change to any given mode. But I have several fixes (including some patches contributed from users) that are ready to commit, and more on the way. Errors and warnings would still need to be asynchronous (if they're enabled). So, too, would the imenu outline and my in-progress buffer-based outline, which is somewhat nicer than the IMenu one. But I think the main objection to js2-mode revolves around its highlighting, correct? If so, AND if we can solve the font-lock integration issues, AND if we can fix the multi-mode issues (II below), then I'm hopeful that js2-mode might become a reasonable choice as the default editing mode for JavaScript. I think espresso-mode is a fine fallback position. Anything but java-mode! The default today is java-mode, and I had no qualms about replacing it as the default for JavaScript. Note: diagnostic messages in js2-mode are highlighted using overlays. I tried using overlays for all highlighting but it was unacceptably slow and had a tendency to crash Emacs. But there are usually not prohibitively many errors and warnings, since the error-recovery algorithm is somewhat coarse-grained. So error-reporting works independently of font-lock. II. Multi-mode support JavaScript is especially needful of mumamo (or equivalent) multi-mode support, because much of the JavaScript in the wild is embedded in HTML, in template files, even in strings in other languages. js2-mode does not support mumamo (or mmm-mode, which which I am currently more familiar) because js2-mode's lexer needs to support ignoring parts of the buffer. I do not think this would be very hard to implement, but I have not done it yet. If I don't get to it before the next version of Emacs launches, then I think this should effectively disqualify js2-mode from being the default JavaScript mode. It would be an inconsistent user experience to have one JavaScript mode in .js files and another mode for JavaScript inside multi-mode-enabled files. I'm ready to give it a try, though, and I'll ping Lennart offline about integrating the two somehow. III. Incremental and partial parsing Lennart and others have asked whether it is possible for js2-mode to support partial or incremental parsing. The short answer is "incremental: yes; partial: no". nxml-mode, last I checked, does incremental parsing. It parses ahead in the buffer, but then stops and saves its state. If you jump forward in the buffer, it resumes and continues the parse until some point beyond the section you're viewing. js2-mode could do it this way without much additional effort. I chose not to because once you've decided to use background parsing, it doesn't seem like an especially useful optimization. But I could see it being helpful in some cases, such as when you're editing near the top of a large file -- as long as the whole file isn't encased in some top-level expression, which unfortunately is often the case in JS. Partial parsing is a different beast entirely. The goal of a partial parser is to re-parse the minimum amount necessary, given some region that has changed. I've dug into this a bit, because originally I wanted to support it in js2-mode. I even made some progress on an implementation. While a few production parsers (for Java and JavaScript) have implemented partial parsing, the vast majority of them do not support it -- instead, they re-parse from the top. They do this because the incremental benefit of partial parsing is debatable, assuming you're time- and resource-constrained, as most of us are. I took a close look at Eclipse and IntelliJ, and even asked some of their users to characterize the highlighting behavior of the IDE. Without exception, the IDE users had internalized a ~1000 ms delay in highlighting and error reporting as part of their normal workflow, and they uniformly described it as "instant{aneous}" until I made them time it. I've been an Emacs user for 20+ years now, and like many I found the idea of a parsing delay to be somewhere between "undesirable" and "sickening". But the majority of programmers today have apparently learned not to notice delays of ~1sec as long as it never interferes with their typing or indentation (see IV below). So after looking at my ~8000 lines of elisp devoted to parsing JavaScript, I weighed it and decided not to support partial parsing. It's certainly possible to support it, but I think my time would be better spent on things that average users are more likely to notice. YMMV, of course. The upshot is that if I'm going to support mumamo, it will need to work within js2-mode's existing full-reparse framework. I can think of various ways to make it work, though, and as I mentioned I'll talk to Lennart about it. IV. Indentation The indentation in js2-mode is broken. I'll be the first to say it. It is based on the indentation in Karl Langstrom's mode, which does a better job for JavaScript than any indenter based on cc-engine, but that doesn't mean it's a good job. And it's essentially unconfigurable. espresso-mode shares this problem, which means that for this important use case it is not an improvement over js2-mode. Daniel's objections to js2-mode's non-interaction with font-lock apply equally to the non-interaction with cc-engine's indentation configuration system. The indent configuration for JavaScript should share as many settings as practical with cc-mode. I actually made a serious attempt to generate the `c-style-alist' data structure for js2-mode using the parse tree, but ran into three issues: 1) it's much harder than I thought it would be, even with a full parse tree available. I had some 2000 lines of elisp invested in it when I pooped out, to be perfectly frank. 2) `c-style-alist' (like font-lock) does not have enough semantic variables to encompass the range of indentation contexts that JavaScript programmers care about. I think we'd need to add 5-10 more, although it's been 18 months since I looked into it. 3) indentation in "normal" Emacs modes also runs synchronously as the user types. Waiting 500-800 msec or more for the parse to finish is (I think) not acceptable for indentation. For small files the parse time is acceptable, but it would not be generally scalable. #3 is the reason I gave up on #1. It didn't seem to be worth the effort to produce an accurate but slow indenter. I don't know exactly how to solve this problem. I have lots of ideas, but it appears there are few low-hanging fruit in this space. V. Font Lock framework design problems There seems to be a common misconception flitting about to the effect that font-lock is perfect and will never need to change. This is a somewhat paradoxical viewpoint in view of the corpses littering the path to jit-lock, which include font-lock, fast-lock, lazy-lock, and vapor-lock. Each decade we've had a cadre of people claiming that *-lock meets everyone's needs, and then it gets rewritten anyway. So it's hard to understand how it remains such a popular viewpoint. I'll make yet another attempt to dispel it, since once we're past the emotional stumbling blocks, font-lock may be able to evolve again. Va) Inadequate/insufficient style names There are not enough font-lock faces to represent all the semantic style runs that are identifiable to "real" language analyzers. js2-mode makes several semantic distinctions not available in most Emacs modes, although such distinctions are available in JDEE and other Cedet-enabled modes, so js2-mode is by no means alone in its needs. In addition to the autoloaded font-lock faces, which js2-mode uses whenever possible, js2-mode defines several new faces, including: * function parameters * "class" instance members (in JS, prototype and instance props) * local variables * undeclared variables * private members (although I implemented it poorly -- see below) * html/xml tags, attr names and delimiters -- used both for html in jsdoc comments and for E4X literals * doc tags such as those typically found in javadoc/jsdoc comments * warnings, errors, and informational diagnostics I do not expect that this set is all-inclusive -- over time as js2-mode and similar modes get smarter, they will be able to make other semantic distinctions that users may wish to customize independently. Given that Emacs is the most configurable editor on the planet, I do not see any reason to entertain arguments to the contrary. Vb) Ad-hoc default faces that are not being autoloaded There are some modes (e.g. sgml-mode, html-mode, nxml-mode) that define their own versions of some of the xml/html faces, but it did not seem right to make js2-mode 'require one of these modes just to get at ad-hoc "standard" definitions for these faces. We should define standard faces for xml/html tags and entities, and for any other faces that are effectively defined by 2 or more modes. Vc) Additional semantic styles not needed by JavaScript I have other language modes in progress, and together they define an ever larger set of semantic styles. The set of available font-lock names should try to encompass the _union_ of the needs of most languages, not the intersection. There should, for instance, be a font-lock-symbol-face for languages with distinguished symbols such as Lisp, Scheme and Ruby. I think this is relatively easy to fix, provided a little thought goes into choosing the new faces. Vd and Ve below should help clarify why it requires greater than zero thought. Vd) Composable semantic styles Some font-lock faces represent "primary" semantic roles, in a vague way. For instance, there is a font-lock-function-name-face, and this is different from font-lock-variable-name-face. While in some languages (including JavaScript) the distinction is not necessarily exact, they can usually be reconciled -- e.g. being a function is a more "important" property of an identifier than being a variable. Most of the font-lock faces represent very common primary roles: strings, comments, keywords, types, preprocessor macros. But not all. font-lock-constant face is actually orthogonal to the primary role. A class or method or parameter can be const or non-const in some languages. The semantic notion of public/private/protected/package/friend visibility is another example. So is "abstract"/"pure virtual". Emacs supports composable faces (a style run may have multiple faces, and the attributes compose according to predefined rules), but font-lock provides neither consistent nor adequate support for this notion. Ve) Ambiguous semantic styles At least one of the face names is ambiguous -- it's not clear what font-lock-builtin-face is actually supposed to highlight. The result is that different language modes use it for different kinds of entities. If you customize the face for one mode, you may wind up with unsatisfying results in another mode due to the differences in relative weighting/distribution of semantic types across languages. As a hypothetical example, someone might enhance python-mode to use font-lock-builtin-face to highlight True/False/None and possibly "self", since they're not keywords but they are all handled specially by the runtime. (font-lock-type-face might be better for this, but since they're not really classes, you could argue it either way). These tokens appear relatively infrequently in Python. If someone else were to use it to highlight functions implemented in C in elisp, there would be a lot more of that face appearing in elisp buffers, and it might not be easy to choose one face that looks nice in both situations. Regardless of the fate of js2-mode, font-lock needs to add more semantic faces. By default these new faces might simply inherit face attributes from their "syntactic parents" -- e.g. the faces for locals, parameters, instance and static vars might all inherit the settings for `font-lock-variable-name-face'. But users should be able to differentiate among them when the information is available. Vf) No font-lock interface for setting exact style runs I could be mistaken here -- if so, please correct me. My limited understanding of font-lock and its main entry-point mechanisms such as font-lock-keywords and font-lock-apply-highlight, all of which use the MATCH-HIGHLIGHT data structure, is that they are not quite powerful enough for my needs in their current incarnation. This issue is independent of asynchronous parsing -- I think that even if my parser were instantaneous, I would still have this issue. The problem is that I need a way, in a given font-lock redisplay, to say "highlight the region from X to Y with text properties {Z}". This use case does not seem like it should be inordinately difficult to support, but it does not seem to be supported today. When I assert that it's not possible, I understand that it's _theoretically_ possible. Given a JavaScript file with 2500 style runs, assuming I had that information available at font-lock time, I could return a matcher that contains 2500 regular expressions, each one of which is tailored to match one and exactly one region in the buffer. In practice, however, I am not aware of a way to do this that is either clean or efficient. If this simple feature were supported, I would have a great deal more incentive to try to get my parsing to be fast enough to work within the time constraints users expect from font-lock. Vg) Lack of differentiation between mode- and minor-mode styles One of the most common complaints from the thousands of users of js2-mode, most of whom have exercised enough self-restraint to use the term "work in progress" in preference to "abomination", is that js2-mode has poor support for minor modes that do their work with font-lock -- 80-column highlighters being a popular example, although there are others. The fundamental problem here is that the font-lock framework does not differentiate between the mode's syntax highlighting and the keywords installed by minor modes and by user code. Instead, it merges them. As far as I can tell, the officially supported mechanism for adding additional font-lock patterns is `font-lock-add-keywords'. This either appends or prepends the keywords to the defaults. It might be possible to reverse-engineer it, for instance by manually diffing the buffer's font-lock-defaults and font-lock-keywords and trying to figure out which ones were added by participants other than the major mode. Even if it's possible, it's not clear that it always works now, and would always work in the future. For one thing, it's possible (as Daniel observes) to bypass this mechanism and call font-lock-apply-highlight directly, which makes the reverse-engineering even more cumbersome and fragile. (Vf) is the reason (Vg) is a problem for js2-mode. font-lock-defaults does not seem to be a very satisfactory way to apply 2000-10000 precise style runs to a buffer, so I do all my own highlighting, and it doesn't include style-run contributions from minor modes. I've made some halfhearted attempts to hack around the problem, but they've proven fragile. If font-lock were to support (Vf), then I think (Vg) should "just work". VI. Summary I've called out some of the main integration issues I've encountered. I've penned several major and minor language modes, not just js2-mode, and I've chosen to whine here about the problems that could best be classified as "problem themes". I'm around, and I'm available for nontrivial work. If group consensus is that js2-mode isn't ready yet, I'm happy to keep hacking on it and taking user patches and feedback until Emacs 24 rolls around. But it would be nice to have more direct support for modes like mine. I'm willing to do my end of it, but I'm always oversubscribed, and I've already signed up to support mouse-enter and mouse-left text props as part of another js2-mode-related thread. So a little help would go a long way. -steve --001636164225fb42000470d81398 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello all,

Thanks for opening this can o= f, er, threads. =A0I was going to ask about
these things myself s= oon in any case, because it's clear that js2-mode
is not doin= g a very effective job of surfacing its rich information in
Emacs. =A0This is partly my fault, but it is also partly due to some
issues with font-lock that I'll describe in nauseating detail.=

There are several important ideas being conflated= in this thread that
I think need to be teased apart before we can talk responsibly about
any of them. =A0I've called out the top five conflations in se= ctions
below delimited by roman numerals.

This is all in some sense an elaboration of what Eric Ludlam just
posted, to which I can only add my miserable +1.

= Stephen Eilert wrote:
> I do not think that was done without a= very good reason (and there's
=A0=A0a lengthy post explaining it), unless the author is a complete
=A0=A0masochist.

I don't think of mys= elf that way. =A0Here, as requested, is a lengthy post
explaining= my approach. =A0For the record, it could have been much
lengthier, and I have lengthy replies ready for all your objections an= d
concerns. =A0(Just in case you were wondering.)

<= /div>
I really do want to get this resolved, though.

I. Asynchronous parsing

js2-mode performs bot= h syntactic and (some) semantic analysis. =A0It
knows, for instan= ce, when you're using a symbol that's not defined in
its = file. =A0js2-mode does not currently understand project structure,
but I'm doing some work in this area, and it may at some point gat= her
semantic information collected from several files.
=
Because this analysis requires parsing the entire file at le= ast once
(see my discussion of partial/incremental parsing below), and it may
someday involve looking at symbol tables from other files, it seem= ed
best to run the parse asynchronously, so as not to interfere w= ith the
user's editing.

One byproduct of having a= n accurate parser and symbol table is that
you can obtain style r= uns with relatively small effort, so js2-mode
does its own highli= ghting. =A0The downside is that this highlighting
information is unavailable at font-lock time, and it is not available<= /div>
piecewise -- it's all-or-nothing.

Th= ere is a relatively simple alternative that might appease Daniel:
I could have js2-mode simply not do any highlighting by default,
=
except for errors and warnings. =A0We'd use whatever highlighting = is
provided by espresso-mode, and users would be able to choose b= etween
espresso-highlighting and js2-mode highlighting. =A0With the former,
they'd get "instantaneous" font-locking, albeit not = as rich as what
js2-mode can provide.

This would be trivial to change. =A0I am actively maintaining js2-mode,
and the only reason I haven't checked in any changes since my in= itial
commit to the trunk is inexperience: =A0I'm trying to g= et a handle on how
many changes people tend to aggregate before checking in a change to
any given mode. =A0But I have several fixes (including some patche= s
contributed from users) that are ready to commit, and more on t= he way.

Errors and warnings would still need to be asynchronous= (if they're
enabled). =A0So, too, would the imenu outline an= d my in-progress
buffer-based outline, which is somewhat nicer th= an the IMenu one.

But I think the main objection to js2-mode revolves aro= und its
highlighting, correct? =A0If so, AND if we can solve the = font-lock
integration issues, AND if we can fix the multi-mode is= sues (II
below), then I'm hopeful that js2-mode might become a reasonable
choice as the default editing mode for JavaScript.

<= /div>
I think espresso-mode is a fine fallback position. =A0Anything bu= t
java-mode! =A0The default today is java-mode, and I had no qualms abou= t
replacing it as the default for JavaScript.

Note: diagnostic messages in js2-mode are highlighted using overlays.=
I tried using overlays for all highlighting but it was unacceptably
slow and had a tendency to crash Emacs. =A0But there are usually no= t
prohibitively many errors and warnings, since the error-recover= y
algorithm is somewhat coarse-grained. =A0So error-reporting works
independently of font-lock.

II. Multi-mode s= upport

JavaScript is especially needful of mumamo = (or equivalent) multi-mode
support, because much of the JavaScript in the wild is embedded in
HTML, in template files, even in strings in other languages.

js2-mode does not support mumamo (or mmm-mode, which whic= h I am
currently more familiar) because js2-mode's lexer needs to support=
ignoring parts of the buffer. =A0I do not think this would be ve= ry
hard to implement, but I have not done it yet.

If I don't get to it before the next version of Emacs launch= es, then I
think this should effectively disqualify js2-mode from= being the
default JavaScript mode. =A0It would be an inconsisten= t user experience
to have one JavaScript mode in .js files and another mode for
JavaScript inside multi-mode-enabled files.

I= 9;m ready to give it a try, though, and I'll ping Lennart offline about=
integrating the two somehow.

III. Incremental= and partial parsing

Lennart and others have asked= whether it is possible for js2-mode to
support partial or increm= ental parsing. =A0The short answer is
"incremental: yes; partial: no".

nx= ml-mode, last I checked, does incremental parsing. =A0It parses ahead
=
in the buffer, but then stops and saves its state. =A0If you jump forw= ard
in the buffer, it resumes and continues the parse until some point
beyond the section you're viewing.

js2-= mode could do it this way without much additional effort. =A0I chose
not to because once you've decided to use background parsing, it
doesn't seem like an especially useful optimization. =A0But I = could see
it being helpful in some cases, such as when you're= editing near the
top of a large file -- as long as the whole file isn't encased in = some
top-level expression, which unfortunately is often the case = in JS.

Partial parsing is a different beast entire= ly. =A0The goal of a partial
parser is to re-parse the minimum amount necessary, given some region<= /div>
that has changed. =A0I've dug into this a bit, because origin= ally I
wanted to support it in js2-mode. =A0I even made some prog= ress on an
implementation.

While a few production parser= s (for Java and JavaScript) have
implemented partial parsing, the= vast majority of them do not support
it -- instead, they re-pars= e from the top. =A0They do this because the
incremental benefit of partial parsing is debatable, assuming you'= re
time- and resource-constrained, as most of us are.
<= br>
I took a close look at Eclipse and IntelliJ, and even asked s= ome
of their users to characterize the highlighting behavior of the IDE.
Without exception, the IDE users had internalized a ~1000 ms delay=
in highlighting and error reporting as part of their normal work= flow,
and they uniformly described it as "instant{aneous}" until I= made
them time it.

I've been an Ema= cs user for 20+ years now, and like many I found
the idea of a pa= rsing delay to be somewhere between "undesirable"
and "sickening". =A0But the majority of programmers today ha= ve
apparently learned not to notice delays of ~1sec as long as it=
never interferes with their typing or indentation (see IV below)= .

So after looking at my ~8000 lines of elisp devoted to = parsing
JavaScript, I weighed it and decided not to support parti= al parsing.
It's certainly possible to support it, but I thin= k my time would be
better spent on things that average users are more likely to notice.

YMMV, of course.

The upsho= t is that if I'm going to support mumamo, it will need
to wor= k within js2-mode's existing full-reparse framework. =A0I can
think of various ways to make it work, though, and as I mentioned
I'll talk to Lennart about it.

IV. =A0In= dentation

The indentation in js2-mode is broken. = =A0I'll be the first to say it.

It is based on the indentation in Karl Langstrom's = mode, which does a
better job for JavaScript than any indenter ba= sed on cc-engine, but
that doesn't mean it's a good job. = =A0And it's essentially unconfigurable.

espresso-mode shares this problem, which means that for= this
important use case it is not an improvement over js2-mode.<= /div>

Daniel's objections to js2-mode's non-inte= raction with font-lock
apply equally to the non-interaction with cc-engine's indentation<= /div>
configuration system. =A0The indent configuration for JavaScript = should
share as many settings as practical with cc-mode.

I actually made a serious attempt to generate the `c-style-a= list'
data structure for js2-mode using the parse tree, but r= an into three
issues:

=A0=A01) it's = much harder than I thought it would be, even with a full
=A0=A0 =A0 parse tree available. =A0I had some 2000 lines of elisp inv= ested
=A0=A0 =A0 in it when I pooped out, to be perfectly frank.<= /div>

=A0=A02) `c-style-alist' (like font-lock) does= not have enough semantic
=A0=A0 =A0 =A0variables to encompass the range of indentation contexts= that
=A0=A0 =A0 =A0JavaScript programmers care about. =A0I think= we'd need to add
=A0=A0 =A0 =A05-10 more, although it's = been 18 months since I looked into it.

=A0=A03) indentation in "normal" Emacs modes = also runs synchronously as
=A0=A0 =A0 the user types. =A0Waiting = 500-800 msec or more for the parse to
=A0=A0 =A0 finish is (I thi= nk) not acceptable for indentation. =A0For small
=A0=A0 =A0 files the parse time is acceptable, but it would not be gen= erally
=A0=A0 =A0 scalable.

#3 is the re= ason I gave up on #1. =A0It didn't seem to be worth the
effor= t to produce an accurate but slow indenter.

I don't know exactly how to solve this problem. =A0= I have lots of
ideas, but it appears there are few low-hanging fr= uit in this space.

V. Font Lock framework design p= roblems

There seems to be a common misconception flitting about= to the
effect that font-lock is perfect and will never need to c= hange.

This is a somewhat paradoxical viewpoint in= view of the corpses
littering the path to jit-lock, which include font-lock, fast-lock,
lazy-lock, and vapor-lock. =A0Each decade we've had a cadre of = people
claiming that *-lock meets everyone's needs, and then = it gets rewritten
anyway.

So it's hard to understand how it= remains such a popular viewpoint.

I'll make y= et another attempt to dispel it, since once we're past the
emotional stumbling blocks, font-lock may be able to evolve again.

Va) Inadequate/insufficient style names

There are not enough font-lock faces to represent all the semantic
style runs that are identifiable to "real" language analyzer= s.
js2-mode makes several semantic distinctions not available in = most
Emacs modes, although such distinctions are available in JDE= E and
other Cedet-enabled modes, so js2-mode is by no means alone in its
needs.

In addition to the autoloaded font-l= ock faces, which js2-mode uses
whenever possible, js2-mode define= s several new faces, including:

=A0=A0* function parameters
=A0=A0* "cla= ss" instance members (in JS, prototype and instance props)
= =A0=A0* local variables
=A0=A0* undeclared variables
= =A0=A0* private members (although I implemented it poorly -- see below)
=A0=A0* html/xml tags, attr names and delimiters -- used both for html=
=A0=A0 =A0in jsdoc comments and for E4X literals
=A0= =A0* doc tags such as those typically found in javadoc/jsdoc comments
=
=A0=A0* warnings, errors, and informational diagnostics

I do not expect that this set is all-inclusive -- over = time as js2-mode
and similar modes get smarter, they will be able= to make other
semantic distinctions that users may wish to custo= mize independently.
Given that Emacs is the most configurable editor on the planet, I do
not see any reason to entertain arguments to the contrary.

Vb) Ad-hoc default faces that are not being autoloaded

There are some modes (e.g. sgml-mode, html-mode, nxml-m= ode) that
define their own versions of some of the xml/html faces= , but it did
not seem right to make js2-mode 'require one of = these modes just to
get at ad-hoc "standard" definitions for these faces.
<= div>
We should define standard faces for xml/html tags and en= tities, and
for any other faces that are effectively defined by 2= or more modes.

Vc) Additional semantic styles not needed by JavaScript=

I have other language modes in progress, and toge= ther they define an
ever larger set of semantic styles. =A0The se= t of available font-lock
names should try to encompass the _union_ of the needs of most
languages, not the intersection. =A0There should, for instance, be a
font-lock-symbol-face for languages with distinguished symbols such<= /div>
as Lisp, Scheme and Ruby.

I think this is rel= atively easy to fix, provided a little thought
goes into choosing= the new faces. =A0Vd and Ve below should help
clarify why it req= uires greater than zero thought.

Vd) Composable semantic styles

Some font-lock faces represent "primary" semantic roles, in a va= gue
way. =A0For instance, there is a font-lock-function-name-face= , and
this is different from font-lock-variable-name-face. =A0While in some<= /div>
languages (including JavaScript) the distinction is not necessari= ly
exact, they can usually be reconciled -- e.g. being a function= is
a more "important" property of an identifier than being a va= riable.

Most of the font-lock faces represent very= common primary roles:
strings, comments, keywords, types, prepro= cessor macros. =A0But not all.
font-lock-constant face is actually orthogonal to the primary role.
A class or method or parameter can be const or non-const in some
languages.

The semantic notion of public/p= rivate/protected/package/friend
visibility is another example. =A0So is "abstract"/"pur= e virtual".

Emacs supports composable faces (= a style run may have multiple
faces, and the attributes compose a= ccording to predefined rules),
but font-lock provides neither consistent nor adequate support for
this notion.

Ve) Ambiguous semantic styles<= /div>

At least one of the face names is ambiguous -- it&= #39;s not clear what
font-lock-builtin-face is actually supposed to highlight. =A0The resul= t
is that different language modes use it for different kinds of<= /div>
entities. =A0If you customize the face for one mode, you may wind= up
with unsatisfying results in another mode due to the differences
=
in relative weighting/distribution of semantic types across languages.=

As a hypothetical example, someone might enhance = python-mode to
use font-lock-builtin-face to highlight True/False/None and possibly
"self", since they're not keywords but they are all = handled specially
by the runtime. =A0(font-lock-type-face might b= e better for this, but
since they're not really classes, you could argue it either way).<= /div>
These tokens appear relatively infrequently in Python. =A0If some= one
else were to use it to highlight functions implemented in C i= n elisp,
there would be a lot more of that face appearing in elisp buffers,
and it might not be easy to choose one face that looks nice in both<= /div>
situations.

Regardless of the fate of js= 2-mode, font-lock needs to add more
semantic faces. =A0By default these new faces might simply inherit fac= e
attributes from their "syntactic parents" -- e.g. the= faces for
locals, parameters, instance and static vars might all= inherit the
settings for `font-lock-variable-name-face'. =A0But users should b= e
able to differentiate among them when the information is availa= ble.

Vf) No font-lock interface for setting exact = style runs

I could be mistaken here -- if so, please correct me.

My limited understanding of font-lock and its main = entry-point
mechanisms such as font-lock-keywords and font-lock-a= pply-highlight,
all of which use the MATCH-HIGHLIGHT data structure, is that they
are not quite=A0powerful enough for my needs in their current incarna= tion.

This issue is independent of asynchronous pa= rsing -- I think that
even if my parser were instantaneous, I would still have this issue.

The problem is that I need a way, in a given font-l= ock redisplay, to
say "highlight the region from X to Y with= text properties {Z}".

This use case does not seem like it should be inordinat= ely difficult
to support, but it does not seem to be supported to= day.

When I assert that it's not possible, I u= nderstand that it's
_theoretically_ possible. =A0Given a JavaScript file with 2500 style
runs, assuming I had that information available at font-lock time,= I
could return a matcher that contains 2500 regular expressions,= each
one of which is tailored to match one and exactly one region in the
buffer.

In practice, however, I am not awa= re of a way to do this that is
either clean or efficient.

If this simple feature were supported, I would have a g= reat deal more
incentive to try to get my parsing to be fast enou= gh to work within
the time constraints users expect from font-loc= k.

Vg) Lack of differentiation between mode- and minor-mod= e styles

One of the most common complaints from th= e thousands of users of
js2-mode, most of whom have exercised eno= ugh self-restraint to use the
term "work in progress" in preference to "abomination&q= uot;, is that
js2-mode has poor support for minor modes that do t= heir work with
font-lock -- 80-column highlighters being a popula= r example, although
there are others.

The fundamental problem her= e is that the font-lock framework does not
differentiate between = the mode's syntax highlighting and the keywords
installed by = minor modes and by user code. =A0Instead, it merges them.

As far as I can tell, the officially supported mechanis= m for
adding additional font-lock patterns is `font-lock-add-keyw= ords'.
This either appends or prepends the keywords to the de= faults.

It might be possible to reverse-engineer it, for instan= ce by manually
diffing the buffer's font-lock-defaults and fo= nt-lock-keywords and
trying to figure out which ones were added b= y participants other than
the major mode. =A0Even if it's possible, it's not clear that = it always
works now, and would always work in the future.

For one thing, it's possible (as Daniel observes) to = bypass this
mechanism and call font-lock-apply-highlight directly, which makes
the reverse-engineering even more cumbersome and fragile.
=
(Vf) is the reason (Vg) is a problem for js2-mode. =A0font-l= ock-defaults
does not seem to be a very satisfactory way to apply 2000-10000
<= div>precise style runs to a buffer, so I do all my own highlighting,
<= div>and it doesn't include style-run contributions from minor modes.

I've made some halfhearted attempts to hack around = the problem, but
they've proven fragile. =A0If font-lock were= to support (Vf), then I
think (Vg) should "just work".=

VI. =A0Summary

I've called= out some of the main integration issues I've encountered.
I&= #39;ve penned several major and minor language modes, not just js2-mode,
and I've chosen to whine here about the problems that could best b= e
classified as "problem themes".

<= div>I'm around, and I'm available for nontrivial work. =A0If group = consensus
is that js2-mode isn't ready yet, I'm happy to keep hacking on= it and
taking user patches and feedback until Emacs 24 rolls aro= und.

But it would be nice to have more direct supp= ort for modes like mine.
I'm willing to do my end of it, but I'm always oversubscribed,= and I've
already signed up to support mouse-enter and mouse-= left text props
as part of another js2-mode-related thread.

So a little help would go a long way.

-steve
--001636164225fb42000470d81398--