From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.help Subject: Re: why emacs lisp's regex has 2-steps escapes? Date: Thu, 10 Jul 2008 09:39:33 +0000 Message-ID: <20080710093932.GA2649@muc.de> References: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1215681306 11507 80.91.229.12 (10 Jul 2008 09:15:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Jul 2008 09:15:06 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Xah Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 10 11:15:53 2008 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KGsFk-0004IC-2k for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 11:15:52 +0200 Original-Received: from localhost ([127.0.0.1]:57969 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KGsEs-0007x7-Ah for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 05:14:58 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KGsEI-0007wf-BA for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 05:14:22 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KGsEG-0007vr-7p for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 05:14:20 -0400 Original-Received: from [199.232.76.173] (port=44309 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KGsEG-0007vk-0Y for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 05:14:20 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:4673 helo=mail.muc.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KGsEF-0005BV-CG for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 05:14:19 -0400 Original-Received: (qmail 37359 invoked by uid 3782); 10 Jul 2008 09:14:13 -0000 Original-Received: from acm.muc.de (pD9E23D2A.dip.t-dialin.net [217.226.61.42]) by colin2.muc.de (tmda-ofmipd) with ESMTP; Thu, 10 Jul 2008 11:14:11 +0200 Original-Received: (qmail 3300 invoked by uid 1000); 10 Jul 2008 09:39:33 -0000 Content-Disposition: inline In-Reply-To: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com> User-Agent: Mutt/1.5.9i X-Delivery-Agent: TMDA/1.1.5 (Fettercairn) X-Primary-Address: acm@muc.de X-detected-kernel: by monty-python.gnu.org: FreeBSD 4.6-4.9 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:55377 Archived-At: On Wed, Jul 09, 2008 at 03:30:27AM -0700, Xah wrote: > emacs regex has a odd pecularity in that it needs a lot backslashes. > More specifically, a string first needs to be properly escaped, then > this passed to the regex engine. Yes. The greatest number of consecutive backslashes I've seen (in a non-joke context) is 10. > For example, suppose you have this text ???Sin[x] + Sin[y]??? and you need > to capture the x or y. Ironically, Xah, you are doing the same sort of thing in your post, using crazy quote characters (if that is indeed what they are), 0x5397c and 0x5397d (according to C-u C-x =). Over my SSH link to my SSP, your quotes look something like "â~@~]", and are most difficult to read without a pair of sunspecs which filters out the UTF. Could you, perhaps, use the standard ASCII quotes 0x22 and 0x27 here, please? > In emacs i need to use > ???\\(\\[[a-z]\\]\\)??? > for the actual regex > ???\(\[[a-z]\]\)???. > Here's somewhat typical but long regex for matching a html image tag > (search-forward-regexp " \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t) > The toothpick syndrom gets crazy making already difficult regex syntax > impossible to read and hard to code. > My question is, why is elisp's regex has this 2-steps process? Is this > some design decision or just happened that way historically? > Second question: can't elisp create some like ???regex-string??? wrapper > function that automatically takes care of the quoting? I can't see how > this might be difficult? Well, I've hacked up a function to display regexps in *scratch*, concentrating in particular on deeply nested \( .... \| .... \) constructs. It doesn't work so well when the regexp's length exceeds the window width, but it could be enhanced: ######################################################################### (defun translate-rnt (regexp) "REGEXP is a string. Translate any \t \n \r and \f characters to wierd non-ASCII printable characters: \t to Î (206, \xCE), \n to ñ (241, \xF1), \r to ® (174, \xAE) and \f to £ (163, \xA3). The original string is modified." (let (pos) (while (setq pos (string-match "[\t\n\r\f]" regexp)) (setq ch (aref regexp pos)) (aset regexp pos (cond ((eq ch ?\t) ?Î) ((eq ch ?\n) ?ñ) ((eq ch ?\r) ?®) (t ?£)))) regexp)) (defun pp-regexp (regexp) "Pretty print a regexp. This means, contents of \\\\\(s are lowered a line." (or (stringp regexp) (error "parameter is not a string.")) (let ((depth 0) (re (copy-sequence regexp)) (start 0) ; earliest position still without an acm-depth property. (pos 0) ; current analysis position. (max-depth 0) ; How many lines do we need to print? (min-depth 0) ; Pick up "negative depth" errors. pr-line ; output line being constructed line-no ; line number of pr-line, varies between min-depth and max-depth. ) (translate-rnt re) ;; apply acm-depth properties to the whole string. (while (< start (length re)) (setq pos (string-match "\\\\\\((\\(\\?:\\)?\\||\\|)\\)" re start)) (put-text-property start (or pos (length re)) 'acm-depth depth re) (when pos (setq ch (aref (match-string 1 re) 0)) (cond ((eq ch ?\() (put-text-property pos (match-end 1) 'acm-depth depth re) (setq depth (1+ depth)) (if (> depth max-depth) (setq max-depth depth))) ((eq ch ?\|) (put-text-property pos (match-end 1) 'acm-depth (1- depth) re) (if (< (1- depth) min-depth) (setq min-depth (1- depth)))) (t ; (eq ch ?\)) (setq depth (1- depth)) (if (< depth min-depth) (setq min-depth depth)) (put-text-property pos (match-end 1) 'acm-depth depth re)))) (setq start (if pos (match-end 1) (length re)))) ;; print out the strings (setq line-no min-depth) (while (<= line-no max-depth) (with-current-buffer "*scratch*" (goto-char (point-max)) (insert ?\n) (setq pr-line "") (setq start 0) (while (< start (length re)) (setq pos (next-single-property-change start 'acm-depth re (length re))) (setq depth (get-text-property start 'acm-depth re)) (setq pr-line (concat pr-line (if (= depth line-no) (substring re start pos) (make-string (- pos start) ?\ )))) (setq start pos)) (insert pr-line) (setq line-no (1+ line-no)))))) ######################################################################### > Thanks. > Xah -- Alan Mackenzie (Nuremberg, Germany).