unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: font-locking and open parens in column 0
@ 2006-11-02  8:49 Mackenzie, Alan
  2006-11-02 18:31 ` martin rudalics
  0 siblings, 1 reply; 8+ messages in thread
From: Mackenzie, Alan @ 2006-11-02  8:49 UTC (permalink / raw)
  Cc: Martin Rudalics, Richard Stallman

Hi, Emacs!

[Apologies for not being able to reply with proper headers, since I'm
still without a network connection at home.  Please send any CCs to
acm@muc.de as well as my work address.]

Martin Rudalics wrote on Monday 11th September 2006, 15:05:55 +0200:

> (1) With emacs -Q open ~/src/syntax.c

> (2) Execute

>(defun foo ()
>  (interactive)
>  (re-search-forward "string-to-syntax")
>  (forward-line 6)
>  (recenter 0))

>This will fontify the entire body of `string_to_syntax' as a C string
>due to the left paren in column zero of the doc-string.

The cause of this (as Martin (almost) discerned) is that the handling of
(eq open-paren-in-column-0-is-defun-start nil) in begining-of-defun-raw
hasn't been implemented.  The function just looks for a "(" in C0
regardless of that variable.

"Clearly", when that variable is nil, a defun can begin at no place
other
than a paren at the outermost level.  Therefore, the function must scan
the entire source file from BOB, as in the earliest days.

The patch below implements this.  When applied (don't forget to rebuild
Emacs or M-x load-file lisp.elc, since lisp.elc is a preloaded file),
Emacs fontifies string-to-syntax properly after jumping there with M-x
foo.


2006-11-01  Alan Mackenzie  <acm@muc.de>

	* emacs-lisp/lisp.el (beginning-of-defun-raw): Code up the case
	(eq open-paren-in-column-0-is-defun-start nil) by searching for
	least nested open-paren.


*** lisp-1.74.el	2006-02-19 12:51:43.000000000 +0000
--- lisp.el	2006-11-01 22:03:56.313088952 +0000
***************
*** 208,229 ****
  
  If variable `beginning-of-defun-function' is non-nil, its value
  is called as a function to find the defun's beginning."
!   (interactive "p")
!   (if beginning-of-defun-function
!       (if (> (setq arg (or arg 1)) 0)
! 	  (dotimes (i arg)
! 	    (funcall beginning-of-defun-function))
! 	;; Better not call end-of-defun-function directly, in case
! 	;; it's not defined.
! 	(end-of-defun (- arg)))
!     (and arg (< arg 0) (not (eobp)) (forward-char 1))
      (and (re-search-backward (if defun-prompt-regexp
  				 (concat (if
open-paren-in-column-0-is-defun-start
  					     "^\\s(\\|" "")
  					 "\\(?:" defun-prompt-regexp
"\\)\\s(")
  			       "^\\s(")
! 			     nil 'move (or arg 1))
! 	 (progn (goto-char (1- (match-end 0)))) t)))
  
  (defvar end-of-defun-function nil
    "If non-nil, function for function `end-of-defun' to call.
--- 208,270 ----
  
  If variable `beginning-of-defun-function' is non-nil, its value
  is called as a function to find the defun's beginning."
!   (interactive "p") ; change this to "P", maybe, if we ever come to
pass ARG
! 		    ; to beginning-of-defun-function.
!   (unless arg (setq arg 1))		; The call might not be
interactive.
!   (cond
!    (beginning-of-defun-function
!     (if (> arg 0)
! 	(dotimes (i arg)
! 	  (funcall beginning-of-defun-function))
!       ;; Better not call end-of-defun-function directly, in case
!       ;; it's not defined.
!       (end-of-defun (- arg))))
! 
!    ((or defun-prompt-regexp open-paren-in-column-0-is-defun-start)
!     (and (< arg 0) (not (eobp)) (forward-char 1))
      (and (re-search-backward (if defun-prompt-regexp
  				 (concat (if
open-paren-in-column-0-is-defun-start
  					     "^\\s(\\|" "")
  					 "\\(?:" defun-prompt-regexp
"\\)\\s(")
  			       "^\\s(")
! 			     nil 'move arg)
! 	 (progn (goto-char (1- (match-end 0)))) t))
! 
!    (t
!     ;; Column 0 has no significance - so scan forward from BOB to see
how
!     ;; nested point is, then carry on from there.
!     (let ((floor (point-min))
! 	  (ceiling (point-max))
! 	  pps-state nesting-depth)
!       (save-restriction
! 	(widen)
! 	(setq pps-state (parse-partial-sexp (point-min) (point))
! 	      nesting-depth (nth 0 pps-state))
! 	;; Get outside of any string or comment.
! 	(if (nth 8 pps-state)
! 	    (goto-char (nth 8 pps-state)))
! 
! 	(cond
! 	 ((> arg 0)
! 	  (when (> nesting-depth 0)
! 	    (up-list (- nesting-depth))
! 	    (setq arg (1- arg)))
! 	  ;; We're now outside of any defun.
! 	  (backward-list arg)
! 	  (if (< (point) floor) (goto-char floor)))
! 
! 	 ((< arg 0)
! 	  (cond
! 	   ((> nesting-depth 0)
! 	    (up-list nesting-depth)
! 	    (setq arg (1+ arg)))
! 	   ((not (looking-at "\\s("))
! 	    ;; We're between defuns, and not at the start of one.
! 	    (setq arg (1+ arg))))
! 	  (forward-list (- arg))
! 	  (down-list)
! 	  (backward-char)
! 	  (if (> (point) ceiling) (goto-char ceiling)))))))))
  
  (defvar end-of-defun-function nil
    "If non-nil, function for function `end-of-defun' to call.


-- 
Alan Mackenzie (Ittersbach, Germany).



*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
 
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-02  8:49 Mackenzie, Alan
@ 2006-11-02 18:31 ` martin rudalics
  0 siblings, 0 replies; 8+ messages in thread
From: martin rudalics @ 2006-11-02 18:31 UTC (permalink / raw)
  Cc: Alan Mackenzie, Richard Stallman, emacs-devel

Good evening, Alan

 > The cause of this (as Martin (almost) discerned) is that the handling of
 > (eq open-paren-in-column-0-is-defun-start nil) in begining-of-defun-raw
 > hasn't been implemented.  The function just looks for a "(" in C0
 > regardless of that variable.
 >
 > "Clearly", when that variable is nil, a defun can begin at no place
 > other
 > than a paren at the outermost level.  Therefore, the function must scan
 > the entire source file from BOB, as in the earliest days.

No!  That would be a serious regression.  Font-locking should never be
forced to scan from BOB.  Observe that this would only serve to handle
the rare case where a user puts a paren in column zero of a C comment.
I think warning about such parens as in emacs-lisp-mode is sufficient.
Please try to make use of Richard's `font-lock-syntax-paren-check' from
this thread as with

     (put font-lock-beginning-of-syntax-function
	 'font-lock-syntax-paren-check t)

Moreover, any such code as yours should

(1) Consult `syntax-ppss' first.

(2) Try to use the 9th field of the return value of `parse-partial-sexp'
to find the outermost paren instead of up-/forward-/backward-listing.

(3) Crowd the cache of `syntax-ppss' in order to avoid further scans.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-03  8:44 AW: " Mackenzie, Alan
@ 2006-11-03 14:14 ` Stefan Monnier
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Monnier @ 2006-11-03 14:14 UTC (permalink / raw)
  Cc: martin rudalics, Richard Stallman, emacs-devel

>> > The cause of this (as Martin (almost) discerned) is that the handling of
>> > (eq open-paren-in-column-0-is-defun-start nil) in begining-of-defun-raw
>> > hasn't been implemented.  The function just looks for a "(" in C0
>> > regardless of that variable.

According to its docstring, the current behavior is correct:

   Normally a defun starts when there is a char with open-parenthesis
   syntax at the beginning of a line.

I think the bug is in cc-mode's use of beginning-of-defun.
BTW, the alternative used in elisp might be OK as well: flag with a big
`warning' face those spots in the file which match defun-prompt-regexp (or
where the line starts with an open paren) but which are inside a comment or
string.  This way you may get erroneous fontification, but the root cause is
clearly marked.



        Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
@ 2006-11-03 16:19 Mackenzie, Alan
  2006-11-03 19:29 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mackenzie, Alan @ 2006-11-03 16:19 UTC (permalink / raw)
  Cc: Alan Mackenzie, Richard Stallman, emacs-devel

Hi, Martin! 

>-----Ursprüngliche Nachricht-----
>Von: martin rudalics [mailto:rudalics@gmx.at] 
>Gesendet: Freitag, 3. November 2006 15:03
>An: Mackenzie, Alan
>Cc: Richard Stallman; emacs-devel@gnu.org; Alan Mackenzie
>Betreff: Re: AW: font-locking and open parens in column 0
>
>Good afternoon, Alan
>

[ .... ]

> > Are you saying that
> > open-paren-in-column-0-is-defun-start shouldn't exist at all?  When
> > it is nil, a paren in column 0 may not, of itself, be regarded as a
> > defun start.

>I fail to understand the present state of things.  On the one hand,
>`open-paren-in-column-0-is-defun-start' is customizable which means a
>user should be able to set it and a major mode should respect that.  On
>the other hand, c-mode deliberately sets this to nil.  I think users
>should be free to express their choice here if they consider their
>machine inapt for scanning from bob.

As far as I can see, the purpose of open-p-i-c-0-i-d-start wasn't
clearly recorded in the ChangeLog.  The value t is clear; I can't think
of any meaning for nil other than the one I've given it.  Certainly,
Martin Stjernholm assumed that meaning (goto a brace at top level) when
he coded up c-beginning-of-defun.  If this meaning is not given to it,
there doesn't seem to be any point in having the variable.

I think it was made customizable so that a user could set it for
"efficiency" (when his perl files' functions all have { at column 0) or
"rigour" (when he uses "k&r" placement of braces in his files.pl).  It
doesn't seem to have worked very well, so far.

CC Mode always sets o-p-i-c-0-i-d-start to nil, and caches the brace
structure to prevent excessive scanning from BOB.  After all, opic0id
being nil will always find BO-defun.  Setting it to t was an optimisation
for when computers were much less powerful than they are now - and this
causes quite a bit of inconvenience.

> > I would say, rather, that font-locking should not use b-o-defun-raw
> > when o-p-i-c-0-i-d-s is nil, except in exceptional circumstances.

>Font-lock uses `syntax-ppss' which may call `syntax-begin-function'
>which may be defined as `beginning-of-defun' which usually calls
>`beginning-of-defun-raw'.  When I open a C file and jump to a position
>before stealth fontification gets there, that's the way things behave.

OK

> > The case
> > you spotted in syntax.c (and I've really no idea how you did ;-), is
> > such an exceptional case.

>I spotted that incidentally when scrolling backwards through syntax.c.
>Anyway, it *is* exceptional and thus should not warrant any major
>change.  Richard's patch just comes in handy.

I confess I'm not familiar with this.  I've been without net access for
several weeks.  I'm currently at the mercy of Deutsche Telekom, who have
promised to get my new DSL connection working real soon now.  :-(

> > CC Mode caches parenthesis structures.

>... which parallels the work of `syntax-ppss', hence we currently end up
>with two caches for the same structures - I know c-mode has to work hard
>to handle all sorts of older (X)Emacsen ...

Not only that, CC Mode uses its cache for things other than font-locking.

> > My patch did fix the bug (a whole screenful of misfontified string in
> > syntax.c), though, didn't it?

>It does fix it, and it's even pretty fast ;-).  But I still think the
>"bug" is with the author who put the left paren in column zero of that
>comment.  That author should be warned just as in emacs-lisp-mode.

Should _HAVE_ been warned.  Martin Stjernholm has worked very hard to
remove that restriction from C Mode without sacrificing (much) speed.  It
would be a shame now to leave the restriction in Emacs 22.

[ .... ]

-- 
Alan.



*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
 
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the contents in this e-mail is strictly forbidden.
*******************************************

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-03 16:19 font-locking and open parens in column 0 Mackenzie, Alan
@ 2006-11-03 19:29 ` Stefan Monnier
  2006-11-04 11:35   ` martin rudalics
  2006-11-04  9:30 ` martin rudalics
  2006-11-05  7:08 ` Richard Stallman
  2 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier @ 2006-11-03 19:29 UTC (permalink / raw)
  Cc: martin rudalics, emacs-devel, Richard Stallman, Alan Mackenzie

> CC Mode always sets o-p-i-c-0-i-d-start to nil, and caches the brace
> structure to prevent excessive scanning from BOB.

Interestingly, it may not be sufficient.  There are basically only 2 uses of
open-paren-in-column-0-is-defun-start in Emacs:
- one in lisp.el to define beginning-of-defun (in order to know whether to
  add "^" to defun-prompt-regexp).
- one in syntax.c used in forward-comment when scanning comments backward.

beginning-of-defun has always been based on regexps and not on syntax, so
open-paren-in-column-0-is-defun-start in there has no significant impact
w.r.t. scanning from the beginning of the buffer or not: you can change
defun-prompt-regexp instead and get the same result.

OTOH in forward-comment it's used within C code with no way to hook
into it.  So the paren cache you use has no effect on this :-(
It's on my TODO list to change forward-comment so that it calls syntax-ppss
when needed rather than scan from the beginning of the defun/buffer.

> After all, opic0id being nil will always find BO-defun.  Setting it to
> t was an optimisation for when computers were much less powerful than they
> are now - and this causes quite a bit of inconvenience.

It's an algorithmic optimization, and since in computer science, problems
tend to get bigger as we get bigger machines, it's not clear that the
optimization is less useful now than it was then.  I do believe it is less
useful because we're lucky enough that our "problems" haven't gotten much
bigger: source code size is limited by human constraints more than by the
machine (except for computer-generated code, obviously), and we haven't
noticeably complexified our parsing technology (contrary to many other
programming environments).

>> ... which parallels the work of `syntax-ppss', hence we currently end up
>> with two caches for the same structures - I know c-mode has to work hard
>> to handle all sorts of older (X)Emacsen ...
> Not only that, CC Mode uses its cache for things other than font-locking.

syntax-ppss is used by font-lock but not only.  Whether you could use it for
the other places where you use your cache?  I don't know.  I hope so.
But if you can't I'd be interested to hear about it (mostly to figure out
in which direction syntax-ppss should be improved).

> Should _HAVE_ been warned.  Martin Stjernholm has worked very hard to
> remove that restriction from C Mode without sacrificing (much) speed.  It
> would be a shame now to leave the restriction in Emacs 22.

In that case, I believe a better patch than the one you proposed is to
change cc-fonts.el so as to set syntax-begin-function to nil.  After all,
the whole point of syntax-begin-function is to provide a *heuristic* that
can "quickly" find a safe starting point for parsing.  If you want your
parsing to be more reliable than a heuristic, then just don't use that.


        Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-03 16:19 font-locking and open parens in column 0 Mackenzie, Alan
  2006-11-03 19:29 ` Stefan Monnier
@ 2006-11-04  9:30 ` martin rudalics
  2006-11-05  7:08 ` Richard Stallman
  2 siblings, 0 replies; 8+ messages in thread
From: martin rudalics @ 2006-11-04  9:30 UTC (permalink / raw)
  Cc: Alan Mackenzie, Richard Stallman, emacs-devel

 >> But I still think the
 >>"bug" is with the author who put the left paren in column zero of that
 >>comment.  That author should be warned just as in emacs-lisp-mode.
 >
 >
 > Should _HAVE_ been warned.  Martin Stjernholm has worked very hard to
 > remove that restriction from C Mode without sacrificing (much) speed.  It
 > would be a shame now to leave the restriction in Emacs 22.

Which means that Martin Stjernholm's change did not make it to
font-locking which, for c-mode, still uses the `beginning-of-defun' from
lisp-mode.  I see only two reasonable fixes to this:

(1) Set `syntax-begin-function' (or `beginning-of-defun-function') to
     `c-beginning-of-defun' if you trust that change.  This should,
     hopefully, populate your cache.

(2) Set `syntax-begin-function' to nil as Stefan suggested and let
     `syntax-ppss' do the work.  This should populate Stefan's cache.

Any other solution would be a shameful waste of resources since the
results of `parse-partial-sexp' from your original proposal don't get
stored anywhere (jumping to a position n in a c-mode buffer and
subsequently jumping to a position m < n would afford to rescan from
bob).  Also make sure that `font-lock-beginning-of-syntax-function'
nowhere interferes with these settings.

Ideally, there would be one and only one cache populated and consulted
by both - c-mode and `syntax-ppss'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-03 19:29 ` Stefan Monnier
@ 2006-11-04 11:35   ` martin rudalics
  0 siblings, 0 replies; 8+ messages in thread
From: martin rudalics @ 2006-11-04 11:35 UTC (permalink / raw)
  Cc: Mackenzie, Alan, emacs-devel, Richard Stallman, Alan Mackenzie

 > beginning-of-defun has always been based on regexps and not on syntax, so
 > open-paren-in-column-0-is-defun-start in there has no significant impact
 > w.r.t. scanning from the beginning of the buffer or not: you can change
 > defun-prompt-regexp instead and get the same result.

Does this mean we could use `syntax-ppss' and the car of the 9th field
instead and get rid of `open-paren-in-column-0-is-defun-start' once the
`forward-comment'/`find_defun_start' issue has been resolved?  Or,
better, give find_defun_start an additional argument, say "current", to
have it insist on finding the start of the current defun?

 > ... and we haven't
 > noticeably complexified our parsing technology (contrary to many other
 > programming environments).

Once `parse-partial-sexp' is able to handle alternative comment styles
or syntax-table properties (instead of the current text property ersatz)
parsing from bob will get more expensive.

 > syntax-ppss is used by font-lock but not only.  Whether you could use it for
 > the other places where you use your cache?  I don't know.  I hope so.
 > But if you can't I'd be interested to hear about it (mostly to figure out
 > in which direction syntax-ppss should be improved).

`syntax-ppss' *is* a major improvement.  It lacks a documentation of how
its cache gets filled and a way to customize that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: font-locking and open parens in column 0
  2006-11-03 16:19 font-locking and open parens in column 0 Mackenzie, Alan
  2006-11-03 19:29 ` Stefan Monnier
  2006-11-04  9:30 ` martin rudalics
@ 2006-11-05  7:08 ` Richard Stallman
  2 siblings, 0 replies; 8+ messages in thread
From: Richard Stallman @ 2006-11-05  7:08 UTC (permalink / raw)
  Cc: rudalics, acm, emacs-devel

    CC Mode always sets o-p-i-c-0-i-d-start to nil, and caches the brace
    structure to prevent excessive scanning from BOB.  After all, opic0id
    being nil will always find BO-defun.  Setting it to t was an optimisation
    for when computers were much less powerful than they are now - and this
    causes quite a bit of inconvenience.

That optimization may still be important.  It is ok for CC mode to set
this to nil IF it provides another optimization which does the same
job.  For instance, using the cache.  If that makes the same
operations fast, then we don't need the paren-in-column-0
optimization.

There is some doubt about whether your cache is getting used
for all the operations that do this.

    >... which parallels the work of `syntax-ppss', hence we currently end up
    >with two caches for the same structures - I know c-mode has to work hard
    >to handle all sorts of older (X)Emacsen ...

    Not only that, CC Mode uses its cache for things other than font-locking.

Could CC mode use `syntax-ppss' for all those things?
If so, it could get rid of its own cache and rely on the
cache in `syntax-ppss'.

    >It does fix it, and it's even pretty fast ;-).  But I still think the
    >"bug" is with the author who put the left paren in column zero of that
    >comment.  That author should be warned just as in emacs-lisp-mode.

    Should _HAVE_ been warned.  Martin Stjernholm has worked very hard to
    remove that restriction from C Mode without sacrificing (much) speed.  It
    would be a shame now to leave the restriction in Emacs 22.

I agree that if CC mode is not confused by that paren in column 0
then there is no need to flag it for the user.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-11-05  7:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-03 16:19 font-locking and open parens in column 0 Mackenzie, Alan
2006-11-03 19:29 ` Stefan Monnier
2006-11-04 11:35   ` martin rudalics
2006-11-04  9:30 ` martin rudalics
2006-11-05  7:08 ` Richard Stallman
  -- strict thread matches above, loose matches on Subject: below --
2006-11-03  8:44 AW: " Mackenzie, Alan
2006-11-03 14:14 ` Stefan Monnier
2006-11-02  8:49 Mackenzie, Alan
2006-11-02 18:31 ` martin rudalics

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).