From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.devel
Subject: Re: Last steps for pretesting (font-lock-extend-region-function)
Date: Tue, 25 Apr 2006 11:33:55 +0000 (GMT)
Message-ID: <Pine.LNX.3.96.1060425075047.675C-100000@acm.acm>
References: <873bg2g1fw.fsf-monnier+emacs@gnu.org>
Reply-To: Alan Mackenzie <acm@muc.de>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: sea.gmane.org 1145964713 6741 80.91.229.2 (25 Apr 2006 11:31:53 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 25 Apr 2006 11:31:53 +0000 (UTC)
Cc: Richard Stallman <rms@gnu.org>, emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 25 13:31:50 2006
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1FYLld-0005JY-0S
	for ged-emacs-devel@m.gmane.org; Tue, 25 Apr 2006 13:31:45 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1FYLlc-000251-5l
	for ged-emacs-devel@m.gmane.org; Tue, 25 Apr 2006 07:31:40 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1FYLk7-0000nF-Ih
	for emacs-devel@gnu.org; Tue, 25 Apr 2006 07:30:07 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1FYLk6-0000mr-Oi
	for emacs-devel@gnu.org; Tue, 25 Apr 2006 07:30:07 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1FYLk6-0000mk-Ek
	for emacs-devel@gnu.org; Tue, 25 Apr 2006 07:30:06 -0400
Original-Received: from [193.149.49.134] (helo=acm.acm)
	by monty-python.gnu.org with esmtp (Exim 4.52)
	id 1FYLmU-0005YJ-Vf; Tue, 25 Apr 2006 07:32:37 -0400
Original-Received: from localhost (root@localhost)
	by acm.acm (8.8.8/8.8.8) with SMTP id LAA00730;
	Tue, 25 Apr 2006 11:33:56 GMT
X-Sender: root@acm.acm
Original-To: Stefan Monnier <monnier@iro.umontreal.ca>
In-Reply-To: <873bg2g1fw.fsf-monnier+emacs@gnu.org>
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:53371
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/53371>

Hi, Stefan!

On Mon, 24 Apr 2006, Stefan Monnier wrote:

>>>> What is your objection to f-l-extend-region-f in a-c-f?  Is it because it
>>>> might gobble too much processor time?

>>> That and the fact that it's not necessary, and that to use it (as seen
>>> in your example) you need a good bit more work/code than doing it "the
>>> other way" (with code solely run from font-lock-default-fontify-region).

>> It will need an order of magnitude less work for f-l-extend-region-f than
>> for f-l-multiline properties.

>Are you talking about CPU-work or programmer work?

Programmer work.  I can state that with 100% certainty when that
programmer is me.  ;-(

>I was talking about programmer work, and I've shown you the code for
>font-lock-multiline: it's quite a bit shorter and more straightforward
>than your hack using b-c-f and a-c-f: just match the multiline element,
>place a font-lock-multiline property on it, and you're set.

"just", you write.  ;-)  The complexity of font-lock-keywords is an order
of magnitude higher than that of a defun, or a short sequence of related
defuns.  There's the complexity of where to quote, where to omit quotes,
wondering whether to put the "region" regexps before or after the face
setting regexps, whether to make f-l-m start-sticky or stop-sticky.... 

I find the elisp manual page which describes font-lock-keywords
("Search-based Fontification") all but inpenetrable.  It's reduced me to
tears on several occasions.  I've just spent five minutes looking through
it, trying to find the description of the form you suggested to me:

     (".*\\\\\n.*"
      (0 (progn (put-text-property (match-beginning 0) (match-end 0)
                                   'font-lock-multiline t)
                nil)))
 
.  I didn't find it.

Consider the difficulty of testing.  With the approach I've used, it's a
simple matter of C-u C-M-x, and edebugging through the function with M-:.
With your f-l-m approach, you've got to delve into the bowels of
font-lock, find the appropriate function (far from trivial: it's
f-l-default-f-r, I think), instrument it, f and blind as recursive levels
of jit lock build up on your screen, turn off jit-lock (a far from
trivial operation when you've not done it before - it's not documented),
start the whole process again.

>> At least, it will for those hackers who aren't totally familiar with
>> the internals of font-lock-keywords, etc.

>You don't have to be familiar with the internals.  At least not more
>than necessary to write the proper foo-before-change-function plus
>foo-font-lock-lock-extend-region-after-change.

With respect, Stefan, that's rubbish.  You need to know where and how to
write the above Lisp form (the one starting (".*\\\\n.*" ...)) into
f-l-keywords.  For me, that's perhaps half an hour to an hour of
frustration trying to find the right place in "Search-based
Fontification".

>> f-l-e-r-f only needs to be applied at the extremes of the region.
>> f-l-m needs to be applied throughout the entire region.  I doubt very
>> much whether f-l-m would use less processing power than f-l-e-r-f.

>As I said, the important aspect is not how much time it takes each time
>it's executed, but how many times it's executed.  In an
>after-change-function, it risks being executed enough times to become
>visible to the user.

[Hey, "beomce" was a difficult typo to correct ;-]

What about the run-time overhead of having an extra text property over
the entire buffer?  The f-l-e-r-f, as you envisage it (being called only
from f-l-default-fontify-r) will also be called at virtually every buffer
change, since almost every change causes a redisplay.

>> Also, f-l-extend-region-function (called from f-l-a-c-f/j-l-a-c) will
>> work EVERY time, because of its directness and simplicity.

>Reality check?  Here's your code:

>   (defvar c-awk-old-EOLL 0)
>   (make-variable-buffer-local 'c-awk-old-EOLL)
>   ;; End of logical line following the region which is about to be changed.
>   ;; Set in c-awk-before-change and used in c-awk-font-lock-extend-region.
   
>   (defun c-awk-before-change (beg end)
>   ;; This function is called exclusively from the before-change-functions hook.
>   ;; It does two things: Finds the end of the (logical) line on which END lies,
>   ;; and clears c-awk-NL-prop text properties from this point onwards.
>     (save-restriction
>       (save-excursion
>         (setq c-awk-old-EOLL (c-awk-end-of-logical-line end))
>         (c-save-buffer-state nil
>          (c-awk-clear-NL-props end (point-max))))))
>   (add-hook 'before-change-functions c-awk-before-change nil t)
   
>   (defun c-awk-end-of-change-region (beg end old-len)
>     ;; Find the end of the region which needs to be font-locked after a change.
>     ;; This is the end of the logical line on which the change happened, either
>     ;; as it was before the change, or as it is now, whichever is later.
>     ;; N.B. point is left undefined.
>     (max (+ (- c-awk-old-EOLL old-len) (- end beg))
>          (c-awk-end-of-logical-line end)))

>Here's my alternative implementation:

>   (defconst c-awk-font-lock-keywords
>      '(...
>        (".*\\\\\n.*"
>         (0 (progn (put-text-property (match-beginning 0) (match-end 0)
>                                      'font-lock-multiline t)
>                   nil)))
>        ...))

>Now which one's more direct and simple, really?

The one which is adequately commented.  The one which doesn't use an
obscure list structure that you need to look up in a manual.  The one
with thin, obvious interfaces with the rest of Emacs.  The one which is
trivial to debug with edebug.  The one which doesn't conflate determining
the font-lock region with actually fontifying it.  The one which runs
faster on a large region.

[ .... ]

>> It's me that's going to have to implement this mechanism for CC Mode
>> (all modes other than AWK Mode).  It's going to be difficult enough as
>> it is, without the added complications of the difficult to understand
>> the structure font-lock-keywords.

>Just take a deep breath, a couple of steps back, and try to look at the
>example use of font-lock-multiline above as if it were very simple.

I've tried, but I just don't have the intellectual capability to cope
with such complexity without the greatest of difficulty.  You obviously
do.  Complicated nested structures make my eyes glaze over.  There's a
good chance I'm not the only Emacs hacker who has this sort of trouble.

There's an assymetry between our positions: by threatening to remove the
f-l-e-r-f hook (after-change version), you're trying to force me to use
the f-l-m mechanism.  You're trying to impose your taste and judgement on
me, and I resent this quite a lot.  I think the f-l-m text property is a
revolting kludge, but I'm not trying to do away with it.  I'm happy
enough for other hackers to use it, though I'd rather not have to
maintain their code afterwards.

Is there any chance we could bring this discussion to an end, now?  The
things we've been talking about are important, but this discussion has
taken up an enormous amount of my time, and probably yours too.  We've
both got other urgent things to do.

[ .... ]

>> It isn't much effort to be correct, here.  Why not do it right?  (Yes,
>> I'm prepared to do this patch, too.)

>It's enough changes that I don't think it should go into Emacs-22.  I
>prefer keeping the same performance misfeature that we already had in
>Emacs-21 and release Emacs-22 a few days earlier.

>But if you want to fix it, here's how a good fix should work IMNSHO:
>- every function that get added to jit-lock-functions (such as
>  font-lock-fontify-region) should be changed to return the region
>  it actually jit'd.
>- jit-lock-fontify-now should then take the intersection of the returned
>  regions to determine where to set the `fontified' property.

My fix is simpler: to recognise that f-l-default-fontify-region performs
two disparate tasks: (i) Extending the region to be fontified; (ii)
Fontifying this region.  By extracting (ii) into a separate function,
that function could be called directly from jit-lock-fontify-now,
bypassing the redundant second massaging of the region.

[ .... ]

>> We are both aware of a refinement which is needed, namely calling some
>> sort of f-l-extend-region-f from f-l-d-f-r.

>It's not a refinement.  It's a requirement for correctness.

It is, and we need it.  It needs a name - here are the possibilities:

(i) It uses the same hook, font-lock-extend-region-function, as
f-l-after-change-functions, setting the third parameter, OLD-LEN, to nil;

(ii) It uses a distinct hook with a new name, say,
font-lock-extend-chunk-function;

(iii) It usurps the name f-l-e-r-f from its current use, abolishing the
region extension mechanism in the f-l-after-change-f/j-l-after-change.

I'd prefer (i), but would have no trouble with (ii).  I absolutely do NOT
want (iii).

>        Stefan

-- 
Alan.