From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.help Subject: Re: why emacs lisp's regex has 2-steps escapes? Date: Thu, 10 Jul 2008 02:17:36 -0600 Message-ID: References: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1215677890 842 80.91.229.12 (10 Jul 2008 08:18:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 10 Jul 2008 08:18:10 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 10 10:18:56 2008 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KGrMd-0002Fo-Qy for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 10:18:56 +0200 Original-Received: from localhost ([127.0.0.1]:47808 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KGrLm-0007uG-5K for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 04:18:02 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KGrLP-0007u0-6H for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:39 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KGrLN-0007tg-Ad for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:38 -0400 Original-Received: from [199.232.76.173] (port=33374 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KGrLN-0007tb-5R for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:37 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:58616 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KGrLM-0007CJ-U5 for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:37 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1KGrLJ-0001BI-3x for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 08:17:33 +0000 Original-Received: from c-67-190-29-163.hsd1.co.comcast.net ([67.190.29.163]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Jul 2008 08:17:33 +0000 Original-Received: from kevin.d.rodgers by c-67-190-29-163.hsd1.co.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 10 Jul 2008 08:17:33 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 64 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-67-190-29-163.hsd1.co.comcast.net User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) In-Reply-To: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com> X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:55374 Archived-At: Xah wrote: > emacs regex has a odd pecularity in that it needs a lot backslashes. > More specifically, a string first needs to be properly escaped, then > this passed to the regex engine. > > For example, suppose you have this text “Sin[x] + Sin[y]” and you need > to capture the x or y. > > In emacs i need to use > “\\(\\[[a-z]\\]\\)” If all you want to capture is the x or y (without the square brackets): "\\[\\([a-z]\\)\\]" > for the actual regex > “\(\[[a-z]\]\)”. The enclosing double quotes are misleading in this context. I would simply write (again, capturing the letter but not the brackets): \[\([a-z]\)\] Could you show the corresponding syntax in Perl or Java, as both a conceptual (unquoted) regular expression and as a string literal (for comparison)? > Here's somewhat typical but long regex for matching a html image tag > > (search-forward-regexp " \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t) > > The toothpick syndrom gets crazy making already difficult regex syntax > impossible to read and hard to code. One of the reasons Emacs regular expressions are hard-to-read in this way is that parentheses are defined as normal characters that need to be escaped when they are to be interpreted as grouping delimiters, whereas other languages interpret parentheses the opposite (as metacharacters that need to be escaped to be matched literally). > My question is, why is elisp's regex has this 2-steps process? Is this > some design decision or just happened that way historically? It is due to the distinction between a string and the syntax for representing it in a program, and the interpretation of the characters in a string itself (vs. its surface representation) as a regular expression. This is just like writing a shell command (using double quotes around the regular expression) that calls the grep program (which never "sees" the quotes). > Second question: can't elisp create some like “regex-string” wrapper > function that automatically takes care of the quoting? I can't see how > this migth be difficult? All you need to do is specify a regular expression syntax and a string literal syntax that don't define meanings for the same character (here: backslash). -- Kevin Rodgers Denver, Colorado, USA