From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#13541: 24.2.92; awk-mode: wrong font locking regexp literals Date: Mon, 28 Jan 2013 11:14:17 +0000 Message-ID: <20130128111417.GA3330__21715.3509533548$1359372142$gmane$org@acm.acm> References: <20130125175057.GA3345@acm.acm> <20130127185906.GA16161__1271.15463042191$1359313643$gmane$org@acm.acm> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1359372122 20436 80.91.229.3 (28 Jan 2013 11:22:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Jan 2013 11:22:02 +0000 (UTC) Cc: 13541@debbugs.gnu.org To: Leo Liu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jan 28 12:22:21 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Tzmn3-0001JG-RI for geb-bug-gnu-emacs@m.gmane.org; Mon, 28 Jan 2013 12:22:18 +0100 Original-Received: from localhost ([::1]:44770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tzmmm-00045K-0R for geb-bug-gnu-emacs@m.gmane.org; Mon, 28 Jan 2013 06:22:00 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:44998) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tzmmc-000458-Ns for bug-gnu-emacs@gnu.org; Mon, 28 Jan 2013 06:21:57 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TzmmW-00043s-CA for bug-gnu-emacs@gnu.org; Mon, 28 Jan 2013 06:21:50 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:46700) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TzmmP-00043B-8j; Mon, 28 Jan 2013 06:21:37 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Tzmmn-0003Bb-R9; Mon, 28 Jan 2013 06:22:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org, bug-cc-mode@gnu.org Resent-Date: Mon, 28 Jan 2013 11:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13541 X-GNU-PR-Package: emacs,cc-mode X-GNU-PR-Keywords: Original-Received: via spool by 13541-submit@debbugs.gnu.org id=B13541.135937209012209 (code B ref 13541); Mon, 28 Jan 2013 11:22:01 +0000 Original-Received: (at 13541) by debbugs.gnu.org; 28 Jan 2013 11:21:30 +0000 Original-Received: from localhost ([127.0.0.1]:52164 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TzmmH-0003Ar-CT for submit@debbugs.gnu.org; Mon, 28 Jan 2013 06:21:30 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:29438 helo=mail.muc.de) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TzmmE-0003Ac-E4 for 13541@debbugs.gnu.org; Mon, 28 Jan 2013 06:21:28 -0500 Original-Received: (qmail 10209 invoked by uid 3782); 28 Jan 2013 11:21:00 -0000 Original-Received: from acm.muc.de (pD9556269.dip.t-dialin.net [217.85.98.105]) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 28 Jan 2013 12:20:59 +0100 Original-Received: (qmail 3479 invoked by uid 1000); 28 Jan 2013 11:14:17 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:70399 Archived-At: Hi, Leo. On Mon, Jan 28, 2013 at 09:12:01AM +0800, Leo Liu wrote: > On 2013-01-28 02:59 +0800, Alan Mackenzie wrote: > > Yes, thanks for spotting this. The situation was more complicated than I > > thought. I think this replacement patch fixes that case (together with a > > few others). Would you try it out again, please. > Still fails with: > /a/ { (print /abc/) } Whoops! There's a slight glitch in one of the regexps in cc-awk.el. If there were a space before "print", it would be "all right". I've sent a corrected patch below. > or > /a/ { p /abc/ } # incorrect awk so not sure a bug or feature That "/abc/" is two division signs with a variable between them. :-) Compare your text with this: BEGIN { a = 1 } /a/ { print a /a/ a } At the moment, after an alphanumeric token, /regexp/ is only a regexp when the token is one of the keywords ("print" "case" "return"). There might be more such keywords (I've not found any). In a way, "printf" could be one too, except its first argument is always the format string, so that wouldn't be useful. Here's the amended patch: === modified file 'lisp/progmodes/cc-awk.el' *** lisp/progmodes/cc-awk.el 2013-01-01 09:11:05 +0000 --- lisp/progmodes/cc-awk.el 2013-01-28 10:57:52 +0000 *************** *** 127,148 **** ;; escaped EOL. ;; REGEXPS FOR "HARMLESS" STRINGS/LINES. - (defconst c-awk-harmless-char-re "[^_#/\"\\\\\n\r]") - ;; Matches any character but a _, #, /, ", \, or newline. N.B. _" starts a - ;; localization string in gawk 3.1 (defconst c-awk-harmless-_ "_\\([^\"]\\|\\'\\)") ;; Matches an underline NOT followed by ". (defconst c-awk-harmless-string*-re (concat "\\(" c-awk-harmless-char-re "\\|" c-awk-esc-pair-re "\\|" c-awk-harmless-_ "\\)*")) ! ;; Matches a (possibly empty) sequence of chars without unescaped /, ", \, ! ;; #, or newlines. (defconst c-awk-harmless-string*-here-re (concat "\\=" c-awk-harmless-string*-re)) ! ;; Matches the (possibly empty) sequence of chars without unescaped /, ", \, ! ;; at point. (defconst c-awk-harmless-line-re ! (concat c-awk-harmless-string*-re ! "\\(" c-awk-comment-without-nl "\\)?" c-awk-nl-or-eob)) ;; Matches (the tail of) an AWK \"logical\" line not containing an unescaped ;; " or /. "logical" means "possibly containing escaped newlines". A comment ;; is matched as part of the line even if it contains a " or a /. The End of --- 127,155 ---- ;; escaped EOL. ;; REGEXPS FOR "HARMLESS" STRINGS/LINES. (defconst c-awk-harmless-_ "_\\([^\"]\\|\\'\\)") ;; Matches an underline NOT followed by ". + (defconst c-awk-harmless-char-re "[^_#/\"{}();\\\\\n\r]") + ;; Mathches any character not significant in the state machine applying + ;; syntax-table properties to "s and /s. (defconst c-awk-harmless-string*-re (concat "\\(" c-awk-harmless-char-re "\\|" c-awk-esc-pair-re "\\|" c-awk-harmless-_ "\\)*")) ! ;; Matches a (possibly empty) sequence of characters insignificant in the ! ;; state machine applying syntax-table properties to "s and /s. (defconst c-awk-harmless-string*-here-re (concat "\\=" c-awk-harmless-string*-re)) ! ;; Matches the (possibly empty) sequence of "insignificant" chars at point. ! ! (defconst c-awk-harmless-line-char-re "[^_#/\"\\\\\n\r]") ! ;; Matches any character but a _, #, /, ", \, or newline. N.B. _" starts a ! ;; localisation string in gawk 3.1 ! (defconst c-awk-harmless-line-string*-re ! (concat "\\(" c-awk-harmless-line-char-re "\\|" c-awk-esc-pair-re "\\|" c-awk-harmless-_ "\\)*")) ! ;; Matches a (possibly empty) sequence of chars without unescaped /, ", \, ! ;; #, or newlines. (defconst c-awk-harmless-line-re ! (concat c-awk-harmless-line-string*-re ! "\\(" c-awk-comment-without-nl "\\)?" c-awk-nl-or-eob)) ;; Matches (the tail of) an AWK \"logical\" line not containing an unescaped ;; " or /. "logical" means "possibly containing escaped newlines". A comment ;; is matched as part of the line even if it contains a " or a /. The End of *************** *** 211,217 **** ;; division sign. (defconst c-awk-neutral-re ; "\\([{}@` \t]\\|\\+\\+\\|--\\|\\\\.\\)+") ; changed, 2003/6/7 ! "\\([{}@` \t]\\|\\+\\+\\|--\\|\\\\.\\)") ;; A "neutral" char(pair). Doesn't change the "state" of a subsequent /. ;; This is space/tab, braces, an auto-increment/decrement operator or an ;; escaped character. Or one of the (invalid) characters @ or `. But NOT an --- 218,224 ---- ;; division sign. (defconst c-awk-neutral-re ; "\\([{}@` \t]\\|\\+\\+\\|--\\|\\\\.\\)+") ; changed, 2003/6/7 ! "\\([}@` \t]\\|\\+\\+\\|--\\|\\\\\\(.\\|[\n\r]\\)\\)") ;; A "neutral" char(pair). Doesn't change the "state" of a subsequent /. ;; This is space/tab, braces, an auto-increment/decrement operator or an ;; escaped character. Or one of the (invalid) characters @ or `. But NOT an *************** *** 231,238 **** ;; will only work when there won't be a preceding " or / before the sought / ;; to foul things up. (defconst c-awk-non-arith-op-bra-re ! "[[\(&=:!><,?;'~|]") ! ;; Matches an opening BRAcket, round or square, or any operator character ;; apart from +,-,/,*,%. For the purpose at hand (detecting a / which is a ;; regexp bracket) these arith ops are unnecessary and a pain, because of "++" ;; and "--". --- 238,245 ---- ;; will only work when there won't be a preceding " or / before the sought / ;; to foul things up. (defconst c-awk-non-arith-op-bra-re ! "[[\({&=:!><,?;'~|]") ! ;; Matches an openeing BRAcket ,round or square, or any operator character ;; apart from +,-,/,*,%. For the purpose at hand (detecting a / which is a ;; regexp bracket) these arith ops are unnecessary and a pain, because of "++" ;; and "--". *************** *** 242,247 **** --- 249,264 ---- ;; bracket, in a context where an immediate / would be a division sign. This ;; will only work when there won't be a preceding " or / before the sought / ;; to foul things up. + (defconst c-awk-pre-exp-alphanum-kwd-re + (concat "\\(^\\|\\=\\|[^_\n\r]\\)\\<" + (regexp-opt '("print" "return" "case") t) + "\\>\\([^_\n\r]\\|$\\)")) + ;; Matches all AWK keywords which can precede expressions (including + ;; /regexp/). + (defconst c-awk-kwd-regexp-sign-re + (concat c-awk-pre-exp-alphanum-kwd-re c-awk-neutrals*-re "/")) + ;; Matches a piece of AWK buffer ending in /, where is a keyword + ;; which can precede an expression. ;; REGEXPS USED FOR FINDING THE POSITION OF A "virtual semicolon" (defconst c-awk-_-harmless-nonws-char-re "[^#/\"\\\\\n\r \t]") *************** *** 721,729 **** (goto-char anchor) ;; Analyze the line to find out what the / is. (if (if anchor-state-/div ! (not (search-forward-regexp c-awk-regexp-sign-re (1+ /point) t)) ! (search-forward-regexp c-awk-div-sign-re (1+ /point) t)) ! ;; A division sign. (progn (goto-char (1+ /point)) nil) ;; A regexp opener ;; Jump over the regexp innards, setting the match data. --- 738,747 ---- (goto-char anchor) ;; Analyze the line to find out what the / is. (if (if anchor-state-/div ! (not (search-forward-regexp c-awk-regexp-sign-re (1+ /point) t)) ! (and (not (search-forward-regexp c-awk-kwd-regexp-sign-re (1+ /point) t)) ! (search-forward-regexp c-awk-div-sign-re (1+ /point) t))) ! ;; A division sign. (progn (goto-char (1+ /point)) nil) ;; A regexp opener ;; Jump over the regexp innards, setting the match data. *************** *** 776,787 **** (< (point) lim)) (setq anchor (point)) (search-forward-regexp c-awk-harmless-string*-here-re nil t) ! ;; We are now looking at either a " or a /. ! ;; Do our thing on the string, regexp or division sign. (setq anchor-state-/div ! (if (looking-at "_?\"") ! (c-awk-syntax-tablify-string) ! (c-awk-syntax-tablify-/ anchor anchor-state-/div)))) nil)) ;; ACM, 2002/07/21: Thoughts: We need an AWK Mode after-change function to set --- 794,813 ---- (< (point) lim)) (setq anchor (point)) (search-forward-regexp c-awk-harmless-string*-here-re nil t) ! ;; We are now looking at either a " or a / or a brace/paren/semicolon. ! ;; Do our thing on the string, regexp or divsion sign or update our state. (setq anchor-state-/div ! (cond ! ((looking-at "_?\"") ! (c-awk-syntax-tablify-string)) ! ((eq (char-after) ?/) ! (c-awk-syntax-tablify-/ anchor anchor-state-/div)) ! ((memq (char-after) '(?{ ?} ?\( ?\;)) ! (forward-char) ! nil) ! (t ; ?\) ! (forward-char) ! t)))) nil)) ;; ACM, 2002/07/21: Thoughts: We need an AWK Mode after-change function to set > Leo -- Alan Mackenzie (Nuremberg, Germany).