unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: martin rudalics <rudalics@gmx.at>
Subject: regexp font-lock highlighting
Date: Mon, 30 May 2005 10:41:25 +0200	[thread overview]
Message-ID: <429AD1B5.1020408@gmx.at> (raw)

The recent modification of `lisp-font-lock-keywords-2' to highlight
subexpressions of regexps has two minor bugs:

(1) If you attempt to write the regexp to match the string "\\)" as
     "\\\\\\\\)" the last three chars of that regexp are highlighted with
     `font-lock-comment-face'.

(2) If the region enclosed by the arguments START and END of
     `font-lock-fontify-keywords-region' contains one of "\\(", "\\|",
     "\\)" within a comment, doc-string, or key definition, all
     subsequent occurrences within a normal string are _not_ highlighted.
     `font-lock-fontify-keywords-region' goes to START when it evaluates
     your lambda, decides that the expression should not get highlighted
     since it has the wrong face, and wrongly concludes that no such
     expression exists up to END.

The following lambda should avoid these problems:

        ((lambda (bound)
           (catch 'found
             (while (re-search-forward "\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\([(|)]\\)\\(\\?:\\)?\\)" bound t)
               (unless (match-beginning 2)
                 (let ((face (get-text-property (1- (point)) 'face)))
                   (when (or (and (listp face)
                                  (memq 'font-lock-string-face face))
                             (eq 'font-lock-string-face face))
                     (throw 'found t)))))))
         ;; Should we introduce a lowlight face for this?
         ;; Ideally that would retain the color, dimmed.
         (1 'font-lock-comment-face prepend)
         (3 'bold prepend)
         (4 font-lock-type-face prepend t))



Moreover I don't think that anything is "broken" in the following:

        ;; Underline innermost grouping, so that you can more easily see what
        ;; belongs together.  2005-05-12: Font-lock can go into an
        ;; unbreakable endless loop on this -- something's broken.
        ;;("[\\][\\][(]\\(?:\\?:\\)?\\(\\(?:[^\\\"]+\\|[\\]\\(?:[^\\]\\|[\\][^(]\\)\\)+?\\)[\\][\\][)]"
	 ;;1 'underline prepend)

I believe that `font-lock-fontify-keywords-region' starts backtracking
and this can take hours in more complicated cases.  Anyway, regexps are
not suited to handle this.  If you are willing to pay for two additional
buffer-local variables such as

(defvar regexp-left-paren nil
   "Position of innermost unmatched \"\\\\(\".
The value of this variable is valid iff `regexp-left-paren-end' equals the upper
bound of the region `font-lock-fontify-keywords-region' currently investigates.")
(make-variable-buffer-local 'regexp-left-paren)

(defvar regexp-left-paren-end 0
   "Buffer position indicating whether the value of `regexp-left-paren' is valid.
If the value of this variable equals the value of the upper bound of the region
investigated by `font-lock-fontify-keywords-region' the current value of
`regexp-left-paren' is valid.")
(make-variable-buffer-local 'regexp-left-paren-end)

the following modification of the above lambda expression should handle
this problem:

        ((lambda (bound)
           (catch 'found
             (while (re-search-forward
                     "\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\(\\((\\)\\|\\(|\\)\\|\\()\\)\\)\\)" bound t)
               (when (match-beginning 3)
                 (let ((face (get-text-property (1- (point)) 'face))
                       match-data-length)
                   (when (or (and (listp face)
                                  (memq 'font-lock-string-face face))
                             (eq 'font-lock-string-face face))
                     (cond
                      ((match-beginning 4) ; \\(
                       (setq regexp-left-paren (match-end 4))
                       (setq regexp-left-paren-end bound)
                       (set-match-data
                        (append (butlast (match-data) 2)
                                (list (point-min-marker) (point-min-marker)))))
                      ((match-beginning 5) ; \\|
                       (set-match-data
                        (append (butlast (match-data) 4)
                                (list (point-min-marker) (point-min-marker)))))
                      ((match-beginning 6) ; \\)
                       (set-match-data
                        (append (butlast (match-data) 6)
                                (if (= regexp-left-paren-end bound)
                                    (list (copy-marker regexp-left-paren) (match-beginning 6))
                                  (list (point-min-marker) (point-min-marker)))))
                       (setq regexp-left-paren nil)
                       (setq regexp-left-paren-end 0)))
                     (throw 'found t)))))))
         ;; Should we introduce a lowlight face for this?
         ;; Ideally that would retain the color, dimmed.
         (1 'font-lock-comment-face prepend)
         (3 'bold prepend)
         (4 'underline prepend))

I have tried this on some elisp files which had the original solution
choke and did not encounter any problems.  Note that I removed the
"\\(\\?:\\)?" since I find it distracting to put yet another face here.
If you believe that you _really_ need it you will have to reinsert it,
but in that case you have to modify match-data cropping as well.  (I do
have to modify match-data since redisplay wants some valid buffer
positions for highlighting.)



Finally, I would use three distinct font-lock faces for regexps:

- One face for highlighting the "\\"s which by default should inherit
   from `font-lock-string-face' with a dimmed foreground - I'm using
   Green4 for strings and PaleGreen3 for the "\\"s.  Anyone who doesn't
   like the highlighting could revert to `font-lock-string-face'.

- One face for highlighting the "(", "|" and ")" in these expressions.
   I find `bold' good here but again would leave it to the user whether
   she wants to turn off highlighting this.  Moreover, such a face could
   allow paren-highlighting to _never_ match a paren with that face with
   a paren with another face.  Consequently, paren-matching could finally
   provide more trustable information within regular expressions.

- One face for highlighting the innermost grouping.  Basically,
   `underline' is not bad here but appears a bit noisy in multiline
   expressions or things like

   (concat "\\("
           some-string
           "\\)")

   I'm using a background which is slightly darker than the default
   background and gives regular expressions a very distinguished
   appearance.  Anyway, users should be allowed to turn highlighting off
   by using the default face.

             reply	other threads:[~2005-05-30  8:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-30  8:41 martin rudalics [this message]
2005-05-31  2:45 ` regexp font-lock highlighting Daniel Brockman
2005-06-01  9:39 ` Richard Stallman
2005-06-04  8:11   ` martin rudalics
2005-06-04 17:59     ` Richard Stallman
2005-06-06  9:33       ` martin rudalics
2005-06-11 23:17         ` Richard Stallman
2005-06-15 16:00           ` martin rudalics
2005-07-03  0:09             ` Juri Linkov
2005-07-03  4:10               ` Luc Teirlinck
2005-07-03  6:03               ` Eli Zaretskii
2005-07-03  9:10                 ` martin rudalics
2005-07-04  0:09                   ` Miles Bader
2005-06-06 13:05 ` Juri Linkov
2005-06-08 15:13   ` martin rudalics
2005-06-08 20:34     ` Juri Linkov
2005-06-08 22:42       ` Stefan Monnier
2005-06-08 23:32         ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=429AD1B5.1020408@gmx.at \
    --to=rudalics@gmx.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).