From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.devel Subject: regexp font-lock highlighting Date: Mon, 30 May 2005 10:41:25 +0200 Message-ID: <429AD1B5.1020408@gmx.at> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1117446726 27341 80.91.229.2 (30 May 2005 09:52:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 30 May 2005 09:52:06 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 30 11:51:56 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Dcguz-0002Yk-L7 for ged-emacs-devel@m.gmane.org; Mon, 30 May 2005 11:50:46 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Dcgzo-0006ui-LW for ged-emacs-devel@m.gmane.org; Mon, 30 May 2005 05:55:44 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Dcfxw-0000t9-0a for emacs-devel@gnu.org; Mon, 30 May 2005 04:49:44 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Dcfxo-0000s8-EK for emacs-devel@gnu.org; Mon, 30 May 2005 04:49:38 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Dcfxo-0000s5-5X for emacs-devel@gnu.org; Mon, 30 May 2005 04:49:36 -0400 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.34) id 1Dcfvo-0007XN-5O for emacs-devel@gnu.org; Mon, 30 May 2005 04:47:32 -0400 Original-Received: (qmail invoked by alias); 30 May 2005 08:45:57 -0000 Original-Received: from N812P021.adsl.highway.telekom.at (EHLO [62.47.45.117]) [62.47.45.117] by mail.gmx.net (mp032) with SMTP; 30 May 2005 10:45:57 +0200 X-Authenticated: #14592706 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en Original-To: emacs-devel@gnu.org X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:37883 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:37883 The recent modification of `lisp-font-lock-keywords-2' to highlight subexpressions of regexps has two minor bugs: (1) If you attempt to write the regexp to match the string "\\)" as "\\\\\\\\)" the last three chars of that regexp are highlighted with `font-lock-comment-face'. (2) If the region enclosed by the arguments START and END of `font-lock-fontify-keywords-region' contains one of "\\(", "\\|", "\\)" within a comment, doc-string, or key definition, all subsequent occurrences within a normal string are _not_ highlighted. `font-lock-fontify-keywords-region' goes to START when it evaluates your lambda, decides that the expression should not get highlighted since it has the wrong face, and wrongly concludes that no such expression exists up to END. The following lambda should avoid these problems: ((lambda (bound) (catch 'found (while (re-search-forward "\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\([(|)]\\)\\(\\?:\\)?\\)" bound t) (unless (match-beginning 2) (let ((face (get-text-property (1- (point)) 'face))) (when (or (and (listp face) (memq 'font-lock-string-face face)) (eq 'font-lock-string-face face)) (throw 'found t))))))) ;; Should we introduce a lowlight face for this? ;; Ideally that would retain the color, dimmed. (1 'font-lock-comment-face prepend) (3 'bold prepend) (4 font-lock-type-face prepend t)) Moreover I don't think that anything is "broken" in the following: ;; Underline innermost grouping, so that you can more easily see what ;; belongs together. 2005-05-12: Font-lock can go into an ;; unbreakable endless loop on this -- something's broken. ;;("[\\][\\][(]\\(?:\\?:\\)?\\(\\(?:[^\\\"]+\\|[\\]\\(?:[^\\]\\|[\\][^(]\\)\\)+?\\)[\\][\\][)]" ;;1 'underline prepend) I believe that `font-lock-fontify-keywords-region' starts backtracking and this can take hours in more complicated cases. Anyway, regexps are not suited to handle this. If you are willing to pay for two additional buffer-local variables such as (defvar regexp-left-paren nil "Position of innermost unmatched \"\\\\(\". The value of this variable is valid iff `regexp-left-paren-end' equals the upper bound of the region `font-lock-fontify-keywords-region' currently investigates.") (make-variable-buffer-local 'regexp-left-paren) (defvar regexp-left-paren-end 0 "Buffer position indicating whether the value of `regexp-left-paren' is valid. If the value of this variable equals the value of the upper bound of the region investigated by `font-lock-fontify-keywords-region' the current value of `regexp-left-paren' is valid.") (make-variable-buffer-local 'regexp-left-paren-end) the following modification of the above lambda expression should handle this problem: ((lambda (bound) (catch 'found (while (re-search-forward "\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\(\\((\\)\\|\\(|\\)\\|\\()\\)\\)\\)" bound t) (when (match-beginning 3) (let ((face (get-text-property (1- (point)) 'face)) match-data-length) (when (or (and (listp face) (memq 'font-lock-string-face face)) (eq 'font-lock-string-face face)) (cond ((match-beginning 4) ; \\( (setq regexp-left-paren (match-end 4)) (setq regexp-left-paren-end bound) (set-match-data (append (butlast (match-data) 2) (list (point-min-marker) (point-min-marker))))) ((match-beginning 5) ; \\| (set-match-data (append (butlast (match-data) 4) (list (point-min-marker) (point-min-marker))))) ((match-beginning 6) ; \\) (set-match-data (append (butlast (match-data) 6) (if (= regexp-left-paren-end bound) (list (copy-marker regexp-left-paren) (match-beginning 6)) (list (point-min-marker) (point-min-marker))))) (setq regexp-left-paren nil) (setq regexp-left-paren-end 0))) (throw 'found t))))))) ;; Should we introduce a lowlight face for this? ;; Ideally that would retain the color, dimmed. (1 'font-lock-comment-face prepend) (3 'bold prepend) (4 'underline prepend)) I have tried this on some elisp files which had the original solution choke and did not encounter any problems. Note that I removed the "\\(\\?:\\)?" since I find it distracting to put yet another face here. If you believe that you _really_ need it you will have to reinsert it, but in that case you have to modify match-data cropping as well. (I do have to modify match-data since redisplay wants some valid buffer positions for highlighting.) Finally, I would use three distinct font-lock faces for regexps: - One face for highlighting the "\\"s which by default should inherit from `font-lock-string-face' with a dimmed foreground - I'm using Green4 for strings and PaleGreen3 for the "\\"s. Anyone who doesn't like the highlighting could revert to `font-lock-string-face'. - One face for highlighting the "(", "|" and ")" in these expressions. I find `bold' good here but again would leave it to the user whether she wants to turn off highlighting this. Moreover, such a face could allow paren-highlighting to _never_ match a paren with that face with a paren with another face. Consequently, paren-matching could finally provide more trustable information within regular expressions. - One face for highlighting the innermost grouping. Basically, `underline' is not bad here but appears a bit noisy in multiline expressions or things like (concat "\\(" some-string "\\)") I'm using a background which is slightly darker than the default background and gives regular expressions a very distinguished appearance. Anyway, users should be allowed to turn highlighting off by using the default face.