Richard Stallman wrote:
 >     believe that I found a solution that does the right thing in most cases
 >     and will send it to you in the next days.
 >
 > Could you dscribe in words what it does?

Attached find a file called `lisp-font-lock-regexp.el' which contains
all changes I propose.  You may try to load it, make the face
definitions meet your requirements, and look whether it works.
Syntax-highlighting and decoration for lisp-font-lock-keywords-2 must be
activated.  Eventually someone would have to decide on appropriate
names and defaults for faces.

I have set regexp highlighting to the minimum level 1.  If this were
incorporated in font-lock.el, the standard level should be 0 - which
means no regexp highlighting and thus no obtrusiveness.  Emacs would
behave as before the introduction of regexp highlighting a couple of
weeks ago.  Level 1 does regexp highlighting as introduced recently with
some minor bug fixes.

Levels 2 and 3 should do something that was proposed in font-lock.el but
commented out due to problems with an "unbreakable endless loop".  Level
2 does this for regexp groups on a single line only.  Level 3 should
handle regexp groups spanning several lines as well.  By no means the
default level should equal 3 as will become evident from remarks below.

The variable `lisp-font-lock-regexp' can be used to set the default
level.  Individual buffer settings can be achieved by using the command
`lisp-font-lock-regexp'.


Levels 2 and 3 use the syntax-table property to remove parenthesis
syntax from unescaped parentheses and escaped brackets within regexp
groups.  I added syntax-table to `font-lock-extra-managed-props' since I
don't want font-lock to perform the extra syntactic fontification pass.
This idea is non-standard and could be defeated by anyone who removed
syntax-table from that list - so far no one seems to use syntax-table
properties in elisp-mode.

With that property paren-matching/blinking and forward/backward-sexp
should work "as intended" within parenthetical groups.  You may have
noticed my simple-minded posting on emacs-pretest-bug about forward-sexp
not being able to handle unescaped semicolons within strings.  I
resolved the problem by setting the syntax-table property of `;' to
punctuation within regexp groups.  For a similar reason I reset the
escape syntax property of single backslashes preceding parentheses and
brackets.

I do not treat special characters "as ordinary ones if they are in
contexts where their special meanings make no sense".  Hence,
subexpressions like

\\(\\[[^]]*]\\)* in `reftex-extract-bib-entries-from-thebibliography'

\\(\\[[^\\]]*\\]\\)? in `reftex-all-used-citation-keys'

\\`\\(\\\\[sS]?.\\|\\[\\^?]?[^]]*]\\|[^\\]\\) in `gnus-score-regexp-bad-p'

\\(\[[0-9]+\] \\)* in `gud-jdb-marker-filter'

do contain mismatches.

With level 3 highlighting I'm using the font-lock-multiline property.
Apparently this property is used by `smerge.el' too.  Consequently, I
cannot simply reset the variable `font-lock-multiline' to nil when I
switch to a lower level.  I believe that this variable - and the
variable `parse-sexp-lookup-properties' as well - should be handled in a
way similar to hooks or `buffer-invisibility-spec'.  Anyone who wants to
set these variables should create or append its name to a corresponding
list and remove its name to eventually reset the variable.  Routines
checking the value of the variable would not be affected by this
convention.  Likely font-lock-multiline, syntax-table and
`lisp-font-lock-regexp' prefixed properties should be added to
`yank-excluded-properties' too.


I've been experimenting a bit with level 3 highlighting.  With a 200MHz
PC the results are negative: Fontifiying a buffer is moderatly slow,
modifying text is hardly supportable.  With a 1GHz PC I did not
encounter substantial difficulties with one exception - fontifying
`cperl-init-faces' took a couple of seconds.  I tried to look at bit
closer what's going on.

When I scrolled down through `cperl.el' and looked at what font-lock is
doing I found out that the range from position 168761 to 172839 gets
fontified no less than _seven_ times in sequence: Apparently `xdisp.c' -
encountering an unfontified object at a position START - asks
`jit-lock-function' to fontify from position START.  jit-lock-function
now calls `jit-lock-fontify-now' to fontify from START to (+ START
jit-lock-chunk-size).  The latter sets the fontified property for this
region to t.  `font-lock-default-fontify-region' detects that there is a
font-lock-multiline pattern, fontifies the entire region from beginning
to end of the pattern - the 168761 to 172839 region above - but does not
set the fontified property for this region.

I simply inserted `(put-text-property beg end 'fontified t)' in the text
of `font-lock-default-fontify-region' right before it calls
`font-lock-unfontify-region' and the problem disappeared.

When I change some text within a font-lock-multiline pattern of
`cperl-init-faces' font-lock refontifies the entire area twice which can
take a couple of seconds.  What happens here?  The first refontification
is triggered by redisplay which encounters an unfontified thing it
should display (the thing was unfontified by `jit-lock-after-change'
previously).  The second refontification is eventually triggered by
`jit-lock-context-fontify' which unfontifies everything from
`jit-lock-context-unfontify-pos' until point-max.  However, the second
refontification is useless because font-lock-default-fontify-region
already took care of the font-lock multiline pattern.  Moreover, the
second fontification usualy occurs right after the first has finished
_before_ I am able to enter the next character.

I could resolve this by having font-lock-default-fontify-region
fontify a region iff it has not fontified exactly that region already
since the last modification of the buffer.  But font-lock-multiline
patterns do not seem suited for handling this problem anyway.  Patterns
spanning more than a couple of lines - your mileage may vary - will
delay redisplay because inserting one single character triggers
refontification of the _entire_ pattern.  It should be possible to
resolve this problem by using the `jit-lock-defer-multiline' property.
However, the latter is broken.

Suppose I used jit-lock-defer-multiline instead of font-lock-multiline
for my pattern.  Inserting a character now will not delay redisplay
anymore since font-lock-default-fontify-region does not cater for
jit-lock-defer-multiline.  Eventually, jit-lock-context-fontify will
unfontify the relevant parts of my buffer from the start of the pattern
to point-max, and everything should get fontified correctly.  It does
not, however, when the jit-lock-defer-multiline pattern starts _before_
`window-start': After jit-lock-context-fontify has unfontified the
buffer, redisplay - for some reason I did not investigate - intercepts
this by fontifying the _visible_ part of the buffer without caring about
my pattern.  Eventually, the invisible parts get refontified but the
already fontified part doesn't because, as mentioned before,
font-lock-default-fontify-region does not know jit-lock-defer-multiline
patterns.  Hence, fontification appears incorrect.

I'm afraid there are no simple patches for this.  Hence I provided the
appropriate warnings that level 3 highlighting should be used with
sufficient care.


The feature I propose could be quite useful for people who write regular
expressions only occasionally and I don't want to compromise it on
behalf of the recent controversies on font-lock-comment-delimiter and
font-lock-negation-char-face faces.  On the other hand, I don't want to
give pretext to anyone who plans to introduce yet another feature in the
pre-release phase.  Hence if you think that this should be delayed or
cancelled please tell me so.

I've also experimented with a patch of `show-paren-function' where I
overlay the backslashes in `\\(...\\)' groups with the respective count
of that group.  Hence I don't have to literally step through such pairs
when searching for the subexpressions referenced by match-string,
match-beginning, ...