From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#13369: 24.1; compile message parsing slow because of omake hack Date: Wed, 9 Jan 2013 15:31:06 +0100 Message-ID: <7B5D3D47-4978-498F-905C-CB34B82D8FE1@bredband.net> References: <672E6DB1-196D-491C-BE42-F29CF1C7F1A5@bredband.net> <3FB0B562-759B-4486-90F4-789BF6CEA07F@bredband.net> <0AEEF1C6-8150-4E39-9CD9-30EC963D64D6@bredband.net> <147C0505-44B3-4A7B-8328-A7B27D207ABD@bredband.net> <87k3rmpj5a.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 (Apple Message framework v936) Content-Type: multipart/mixed; boundary=Apple-Mail-3--669938581 X-Trace: ger.gmane.org 1357741928 4088 80.91.229.3 (9 Jan 2013 14:32:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 9 Jan 2013 14:32:08 +0000 (UTC) Cc: 13369@debbugs.gnu.org To: Jambunathan K Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Jan 09 15:32:25 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Tswhc-0001mi-Rg for geb-bug-gnu-emacs@m.gmane.org; Wed, 09 Jan 2013 15:32:25 +0100 Original-Received: from localhost ([::1]:43607 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TswhN-00065F-2j for geb-bug-gnu-emacs@m.gmane.org; Wed, 09 Jan 2013 09:32:09 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:45344) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TswhE-000653-PY for bug-gnu-emacs@gnu.org; Wed, 09 Jan 2013 09:32:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TswhB-0001EO-R4 for bug-gnu-emacs@gnu.org; Wed, 09 Jan 2013 09:32:00 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:44918) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TswhB-0001ED-JD for bug-gnu-emacs@gnu.org; Wed, 09 Jan 2013 09:31:57 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TswhI-0001cY-Hx for bug-gnu-emacs@gnu.org; Wed, 09 Jan 2013 09:32:05 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Jan 2013 14:32:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13369 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13369-submit@debbugs.gnu.org id=B13369.13577419186216 (code B ref 13369); Wed, 09 Jan 2013 14:32:03 +0000 Original-Received: (at 13369) by debbugs.gnu.org; 9 Jan 2013 14:31:58 +0000 Original-Received: from localhost ([127.0.0.1]:58159 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tswh9-0001cB-FK for submit@debbugs.gnu.org; Wed, 09 Jan 2013 09:31:56 -0500 Original-Received: from smtprelay-b21.telenor.se ([195.54.99.212]:37107) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tswh1-0001bt-QN for 13369@debbugs.gnu.org; Wed, 09 Jan 2013 09:31:50 -0500 Original-Received: from ipb5.telenor.se (ipb5.telenor.se [195.54.127.168]) by smtprelay-b21.telenor.se (Postfix) with ESMTP id CE2F0E88EB for <13369@debbugs.gnu.org>; Wed, 9 Jan 2013 15:31:31 +0100 (CET) X-SMTPAUTH-B2: [mategn] X-SENDER-IP: [85.229.35.241] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoDSAEl+7VBV5SPxPGdsb2JhbABEgkGBBoIshQCxWQQDfxYDAQEBATg0gh4BAQQBeQULC0ZDFAaIJAq2FJAvYQOPCJpC X-IronPort-AV: E=Sophos;i="4.84,438,1355094000"; d="el'?scan'208";a="258247385" Original-Received: from c-f123e555.032-29-73746f10.cust.bredbandsbolaget.se ([85.229.35.241]) by ipb5.telenor.se with ESMTP; 09 Jan 2013 15:31:07 +0100 In-Reply-To: <87k3rmpj5a.fsf@gmail.com> X-Mailer: Apple Mail (2.936) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:69542 Archived-At: --Apple-Mail-3--669938581 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit > Why not just share, instead of saying that you will be happy to do so. Sorry, I just assumed that someone already wrote such a thing and that it would be more polished than my amateurish attempt. Here it is. --Apple-Mail-3--669938581 Content-Disposition: attachment; filename=xr.el Content-Type: application/octet-stream; x-unix-mode=0644; name="xr.el" Content-Transfer-Encoding: 7bit ;; xr - convert string regexp to rx notation (require 'rx) (defun xr-parse-char-alt () (let ((set nil)) (when (looking-at "]") (forward-char 1) (setq set (list "]"))) (while (not (looking-at "]")) (cond ;; character class ((looking-at (rx "[:" (group (one-or-more letter)) ":]")) (let* ((sym (intern (match-string 1))) (rx-sym (cond ((eq sym 'unibyte) 'ascii) ((eq sym 'multibyte) 'nonascii) (t sym)))) (setq set (cons sym set)) (goto-char (match-end 0)))) ;; character range ((looking-at (rx (not (any "]")) "-" (not (any "]")))) (let ((range (match-string 0))) ;; We render [0-9] as (any "0-9") instead of (any (?0 . ?9)) ;; for readability and brevity, and because the latter would ;; become (48 . 57) when printed. (setq set (cons range set)) (goto-char (match-end 0)))) ((looking-at (rx eos)) (error "unterminated character alternative")) ;; plain character (including ^ or -) (t (setq set (cons (char-to-string (following-char)) set)) (forward-char 1)))) ;; FIXME: combine several characters into one string (if there is no "-"), ;; like (any "a" "b") -> (any "ab") set)) ;; Reverse a sequence and concatenate adjacent strings. (defun xr-rev-join-seq (rev-seq) (let ((seq nil)) (while rev-seq (if (and (stringp (car rev-seq)) (stringp (car seq))) (setq seq (cons (concat (car rev-seq) (car seq)) (cdr seq))) (setq seq (cons (car rev-seq) seq))) (setq rev-seq (cdr rev-seq))) seq)) (defun xr-parse-seq () (let ((sequence nil)) ; reversed (while (not (looking-at (rx (or "\\|" "\\)" eos)))) (cond ;; nonspecial character ((looking-at (rx (not (any "\\*+?.^$[")))) (forward-char 1) (setq sequence (cons (match-string 0) sequence))) ;; escaped special ((looking-at (rx "\\" (group (any "\\*+?.^$[")))) (forward-char 2) (setq sequence (cons (match-string 1) sequence))) ;; group ((looking-at (rx "\\(" (opt (group "?" (group (zero-or-more digit)) ":")))) (let ((question (match-string 1)) (number (match-string 2)) (end (match-end 0))) (goto-char end) (let* ((group (xr-parse-alt)) ;; optimise - group has an implicit seq (operand (if (and (listp group) (eq (car group) 'seq)) (cdr group) (list group)))) (when (not (looking-at (rx "\\)"))) (error "missing \\)")) (forward-char 2) (let ((item (cond ((not question) ; plain subgroup (cons 'group operand)) ((zerop (length number)) ; shy group group) (t (append (list 'group-n (string-to-number number)) operand))))) (setq sequence (cons item sequence)))))) ;; * ? + (and non-greedy variants) ((looking-at (rx (group (any "*?+")) (opt (group "?")))) (let ((op (match-string 1)) (non-greedy (match-string 2))) (goto-char (match-end 0)) (when (null sequence) (error "postfix operator without operand")) ;; While we could use the same symbols as the operator in the regexp, ;; ? needs to be escaped in symbols and isn't very neat, so we ;; assume that rx-greedy-flag is set. (let* ((sym (cdr (assoc op '(("*" . zero-or-more) ("+" . one-or-more) ("?" . opt))))) (operand (car sequence)) ;; Optimise when the operand is (seq ...) (item (if (and (listp operand) (eq (car operand) 'seq)) (cons sym (cdr operand)) (list sym operand)))) ;; BUG: minimal-match affects everything inside, which is not ;; what we want. Either keep track of the stuff inside and insert ;; maximal-match as appropriate (messy!) or just use the ;; *?, ?? and +? symbols. (setq sequence (cons (if non-greedy (list 'minimal-match item) item) (cdr sequence)))))) ;; \{..\} ((looking-at (rx "\\{" (or (group (one-or-more digit)) (seq (opt (group (one-or-more digit))) "," (opt (group (one-or-more digit))))) "\\}")) (when (null sequence) (error "repetition without operand")) (let ((exactly (match-string 1)) (lower (match-string 2)) (upper (match-string 3))) (goto-char (match-end 0)) (let ((op (cond (exactly (list '= (string-to-number exactly))) ((and lower upper) (list 'repeat (string-to-number lower) (string-to-number upper))) (lower (list '>= (string-to-number lower))) (upper (list 'repeat 0 (string-to-number upper))) (t (list 'zero-or-more))))) (setq sequence (cons (append op (list (car sequence))) (cdr sequence)))))) ;; character alternative ((looking-at (rx "[" (opt (group "^")))) (goto-char (match-end 0)) ;; FIXME: optimise (any digit) -> digit etc (let* ((negated (match-string 1)) (set (cons 'any (xr-parse-char-alt)))) (forward-char 1) (setq sequence (cons (if negated (list 'not set) set) sequence)))) ;; backref ((looking-at (rx "\\" (group digit))) (forward-char 2) (setq sequence (cons (list 'backref (string-to-number (match-string 1))) sequence))) ;; various simple substitutions ((looking-at (rx (or "." "$" "^" "\\w" "\\W" "\\`" "\\'" "\\=" "\\b" "\\B" "\\<" "\\>" "\\_<" "\\_>"))) (goto-char (match-end 0)) (let ((sym (cdr (assoc (match-string 0) '(("." . nonl) ("^" . bol) ("$" . eol) ("\\w" . wordchar) ("\\W" . not-wordchar) ("\\`" . bos) ("\\'" . eos) ("\\=" . point) ("\\b" . word-boundary) ("\\B" . not-word-boundary) ("\\<" . bow) ("\\>" . eow) ("\\_<" . symbol-start) ("\\_>" . symbol-end)))))) (setq sequence (cons sym sequence)))) ;; character syntax ((looking-at (rx "\\" (group (any "sS")) (group anything))) (let ((negated (string-equal (match-string 1) "S")) (syntax-code (match-string 2))) (goto-char (match-end 0)) (let ((sym (assoc syntax-code '(("-" . whitespace) ("." . punctuation) ("w" . word) ("_" . symbol) ("(" . open-parenthesis) (")" . close-parenthesis) ("'" . expression-prefix) ("\"" . string-quote) ("$" . paired-delimiter) ("\\" . escape) ("/" . character-quote) ("<" . comment-start) (">" . comment-end) ("|" . string-delimiter) ("!" . comment-delimiter))))) (when (not sym) (error "unknown syntax code: %s" syntax-code)) (let ((item (list 'syntax (cdr sym)))) (setq sequence (cons (if negated (list 'not item) item) sequence)))))) ;; character categories ((looking-at (rx "\\" (group (any "cC")) (group anything))) (let ((negated (string-equal (match-string 1) "C")) (category-code (match-string 2))) (goto-char (match-end 0)) (let ((sym (assoc category-code '(("0" . consonant) ("1" . base-vowel) ("2" . upper-diacritical-mark) ("3" . lower-diacritical-mark) ("4" . tone-mark) ("5" . symbol) ("6" . digit) ("7" . vowel-modifying-diacritical-mark) ("8" . vowel-sign) ("9" . semivowel-lower) ("<" . not-at-end-of-line) (">" . not-at-beginning-of-line) ("A" . alpha-numeric-two-byte) ("C" . chinse-two-byte) ("G" . greek-two-byte) ("H" . japanese-hiragana-two-byte) ("I" . indian-tow-byte) ("K" . japanese-katakana-two-byte) ("N" . korean-hangul-two-byte) ("Y" . cyrillic-two-byte) ("^" . combining-diacritic) ("a" . ascii) ("b" . arabic) ("c" . chinese) ("e" . ethiopic) ("g" . greek) ("h" . korean) ("i" . indian) ("j" . japanese) ("k" . japanese-katakana) ("l" . latin) ("o" . lao) ("q" . tibetan) ("r" . japanese-roman) ("t" . thai) ("v" . vietnamese) ("w" . hebrew) ("y" . cyrillic) ("|" . can-break))))) (when (not sym) (error "unknown category code: %s" category-code)) (let ((item (list 'category (cdr sym)))) (setq sequence (cons (if negated (list 'not item) item) sequence)))))) ;; error (t (let* ((start (point)) (end (min (+ start 3) (point-max)))) (error "syntax error: %s" (buffer-substring start end)))))) (let ((item-seq (xr-rev-join-seq sequence))) (if (> (length item-seq) 1) (cons 'seq item-seq) (car item-seq))))) (defun xr-parse-alt () (let ((alternatives nil)) ; reversed (while (not (looking-at (rx (or "\\)" eos)))) (setq alternatives (cons (xr-parse-seq) alternatives)) (when (looking-at (rx "\\|")) (forward-char 2))) (if (> (length alternatives) 1) (cons 'or (reverse alternatives)) (car alternatives)))) (defun xr (re-string) "Convert a regexp string to rx notation." (with-temp-buffer (insert re-string) (goto-char (point-min)) (let ((rx (xr-parse-alt))) (when (looking-at (rx "\\)")) (error "unbalanced \\)")) rx))) --Apple-Mail-3--669938581 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit --Apple-Mail-3--669938581--