From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: Alan Mackenzie<none@example.invalid>
Newsgroups: gmane.emacs.help
Subject: Re: eLisp fontlock with mmm-mode
Date: Fri, 12 Sep 2003 21:55:18 +0000
Organization: muc.de e.V. -- private internet access
Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Message-ID: <6cftjb.ck.ln@acm.acm>
References: <mailman.34.1062553939.18171.help-gnu-emacs@gnu.org>
	<151bebc0.0309030659.7ff80bb@posting.google.com>
	<3F561EAE.3030506@yahoo.com>
	<151bebc0.0309050702.b3a9555@posting.google.com>
	<7vsqjb.r5.ln@acm.acm>
	<151bebc0.0309120746.1820d513@posting.google.com>
NNTP-Posting-Host: deer.gmane.org
X-Trace: sea.gmane.org 1063405169 19614 80.91.224.253 (12 Sep 2003 22:19:29 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 12 Sep 2003 22:19:29 +0000 (UTC)
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Sep 13 00:19:26 2003
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19xwGE-0005BT-00
	for <geh-help-gnu-emacs@m.gmane.org>; Sat, 13 Sep 2003 00:19:26 +0200
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.22)
	id 19xwCJ-0000R7-KI
	for geh-help-gnu-emacs@m.gmane.org; Fri, 12 Sep 2003 18:15:23 -0400
Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed.news.nacamar.de!news.space.net!news.muc.de!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 155
Original-NNTP-Posting-Host: acm.muc.de
Original-X-Trace: marvin.muc.de 1063404213 75885 193.149.49.134 (12 Sep 2003 22:03:33
	GMT)
Original-X-Complaints-To: news-admin@muc.de
Original-NNTP-Posting-Date: 12 Sep 2003 22:03:33 GMT
User-Agent: tin/1.4.5-20010409 ("One More Nightmare") (UNIX) (Linux/2.0.35
	(i686))
Original-Xref: shelby.stanford.edu gnu.emacs.help:116571
Original-To: help-gnu-emacs@gnu.org
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Users list for the GNU Emacs text editor  <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://mail.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.help:12492
X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:12492

Joe Kelsey <joe-gg@zircon.seattle.wa.us> wrote on 12 Sep 2003 08:46:51
-0700:
> Alan Mackenzie<none@example.invalid> wrote in message
> news:<7vsqjb.r5.ln@acm.acm>...
>> Joe Kelsey <joe-gg@zircon.seattle.wa.us> wrote on 5 Sep 2003 08:02:02
>> -0700:
>> > Kevin Rodgers <ihs_4664@yahoo.com> wrote in message
>> > news:<3F561EAE.3030506@yahoo.com>...  Joe Kelsey wrote: The
>> > syntax-table text property works differently from the global syntax
>> > table in that it applies to a specific section of the buffer.
>> > However, applying a syntax-table property to a specific section of
>> > text also involves a lot of extra overhead and thus it doesn't come
>> > cheaply.

>> Joe, I take it the value you are giving to the syntax-table text
>> property is a syntax-table, and you're giving this to the whole mmm
>> section of the buffer in a single operation.  What do you mean by "a
>> lot of extra overhead"?  Do you mean extra coding or sluggish
>> performance?  If the latter, do you have any quantitative feel for how
>> bad the hit is?

> The maintainer of mmm-mode felt that turning on
> parse-sexp-lookup-properties might impose unacceptable overhead on
> buffer activities.  I have no direct evidence of any performance
> penalties due to turning on parse-sexp-lookup-properties, but I bow to
> the owner of mmm-mode in his personal decisions.

syntax-table properties are in constant use in AWK Mode (also part of CC
Mode).  I've never felt they impacted the performance significantly, even
on my 166 MHz dinosaur.  But, then again, large AWK buffers are rare.  My
feel (and it's not more than that) is that ST properties will impact the
performance, but by an acceptable degree on a slow (< 200 MHz) machine,
and barely noticeably on a fast (> 1 GHz) machine.

>> > I have experimented in mmm-mode with using the syntax-table text
>> > property to make inactive overlays have specific properties to try
>> > to make indenting work better in multi-mode buffers.

>> You mean, you are setting the STTP to a harmless value (say the WS
>> code)?  In that case, how do you go about restoring the STTP value to
>> what the major mode might have set it to?

> I created a function to apply a syntax-table property to a set of
> regions like this:

[snipped ...]

OK.  This approach rules out the use of the syntax-table property by
major modes, if they are to be used in MMM Mode.  :-(

Maybe it would be possible to adapt the core to support several ST text
properties simultaneously (e.g. syntax-table, syntax-table-cc,
syntax-table-mason, ....), and to setq parse-sexp-lookup-properties to
one of these symbols rather than simply t.

>> > The real problem involves resolving the dichotomy between linear
>> > editing and the discontinuous nature of multi-mode files.  I don't
>> > really have a perfect solution right now.

>> It feels like we could do with some sort of support in the core for
>> multiple sections.  Something a bit like a region, or a narrowed
>> section, but independent of them, inside of which font-locking,
>> indentation and so on would be calculated.

> You seem to feel that some sort of "narrowing" function might work.
> However, in reality, when you look at multi-mode buffers as supported
> by mmm-mode, narrowing by itself does not provide enough context.
> Unfortunately, indentation engines and font-lock engines, at least as
> implemented by cc-mode, rely on a combination of syntax-table
> properties and regular expression searching to accomplish their tasks.

"Unfortunately"?  How else could CC Mode do it?

> For example, take a noweb file.  This consists of a literate program,

As an aside, could you explain what a "literate progam" is, exactly?
What it's for, who uses it, and so on.

> actually a mixture of LaTeX and code in what Norman Ramsey calls
> "chunks".  Each documentation chunk describes ideas behind surrounding
> code chunks.  Part of the literate programming style involves the
> tangle and weave processes.  Tangling means reassembling the disjoint
> code chunks from a web file into a coherent whole for presentation to
> the compiler.  Weaving involves processing the entire web to markup
> sections appropriately, applying pretty-printing markup to the code and
> adjusting the web syntax markup to fit the documentation processor,
> such as LaTeX or HTML.

> While editing such buffers, you want to present different views of the
> buffer in different sections.  For instance, if you want to edit the
> documentation, then you want latex-mode to treat the code chunks as a
> single unmovable piece, essentially a syntactic word and not attempt
> to reformat it using its own peculiar ideas of paragraph formatting.

> Meanwhile, while editing code, you want cc-mode to completely ignore
> the documentation chunks.

How about setting something like c-doc-comment-start-regexp and so on to
something which would transform the documentation chunks into comments
(as far as CC Mode is concerned)?

> One of the most frustrating parts of this comes when cc-mode spots an
> unterminated apostrophe in a preceding documentation chunk and treats
> it as an unterminated string, thus completely screwing up the
> indentation of the code.

This is sort of semi-intentional in CC Mode, so that if you miss out a
required terminator (such as a " or a ; or a ') it fouls up the
indentation of the next line, thus drawing your attention to the error
before you get a compiler syntax message.  But what else could CC Mode
do?  C, C++, and friends are syntactically ghastly languages, and
analyzing then in the backwards direction (necessary for doing the
indentation) is even harder than in the forwards direction (like a
compiler does).

> Also, you may want to consider disjoint code sections as virtually
> appended to each other in order to carry forward indentation from one
> to the other.   Also, you may want to ignore other code sections
> depending on how related they are to each other, since the tangle
> process may move them around into different places, including
> separating them into completely different files (.c versus .h).

You actually need a computer to produce tangled code?  :-)

> I want to have "virtual views" of a buffer imposed upon major modes in
> order to restrict how far afield their regular expression linear
> searches can carry them.  Thus, I want to specify a set of regions
> which cc-mode can consider virtually catenated in order to restrict it
> to only looking at characters in that view.  Then it can use all of
> the regular-expression optimizations to supplement syntax-table
> properties it wants in order to work correctly in multi-mode buffers. 
> Something like narrow-to-region, but given a list of disjoint regions
> to narrow to.

Phew!  Just that, eh?  This certainly goes well beyond the boundaries of
anything CC Mode was ever intended to do.  It sounds to me rather like a
major enhancement to the Emacs core.  

To start the ball rolling, here are some ideas for functions which could
offer this sort of functionality:

(virtualize-to-regions '((1 . 2019) (3350 . 6003) (6252 . 6290)))
(virtual-widen)
(save-virtual-restriction ....)    ; like save-restriction.

These would manipulate "virtual views".  All standard emacs functions
would then work on the view as though it were a single contiguous buffer.

> /Joe

-- 
Alan Mackenzie (Munich, Germany)
Email: aacm@muuc.dee; to decode, wherever there is a repeated letter
(like "aa"), remove half of them (leaving, say, "a").