From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kevin Rodgers <kevin.d.rodgers@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 02:17:36 -0600
Message-ID: <g54gil$lt$1@ger.gmane.org>
References: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1215677890 842 80.91.229.12 (10 Jul 2008 08:18:10 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 10 Jul 2008 08:18:10 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 10 10:18:56 2008
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KGrMd-0002Fo-Qy
	for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 10:18:56 +0200
Original-Received: from localhost ([127.0.0.1]:47808 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KGrLm-0007uG-5K
	for geh-help-gnu-emacs@m.gmane.org; Thu, 10 Jul 2008 04:18:02 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KGrLP-0007u0-6H
	for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:39 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KGrLN-0007tg-Ad
	for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:38 -0400
Original-Received: from [199.232.76.173] (port=33374 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KGrLN-0007tb-5R
	for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:37 -0400
Original-Received: from main.gmane.org ([80.91.229.2]:58616 helo=ciao.gmane.org)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <geh-help-gnu-emacs@m.gmane.org>)
	id 1KGrLM-0007CJ-U5
	for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 04:17:37 -0400
Original-Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1KGrLJ-0001BI-3x
	for help-gnu-emacs@gnu.org; Thu, 10 Jul 2008 08:17:33 +0000
Original-Received: from c-67-190-29-163.hsd1.co.comcast.net ([67.190.29.163])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Thu, 10 Jul 2008 08:17:33 +0000
Original-Received: from kevin.d.rodgers by c-67-190-29-163.hsd1.co.comcast.net with
	local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Thu, 10 Jul 2008 08:17:33 +0000
X-Injected-Via-Gmane: http://gmane.org/
Original-Lines: 64
Original-X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: c-67-190-29-163.hsd1.co.comcast.net
User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421)
In-Reply-To: <6af67261-c625-4aee-b2f8-e1247235c332@l64g2000hse.googlegroups.com>
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:55374
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/55374>

Xah wrote:
> emacs regex has a odd pecularity in that it needs a lot backslashes.
> More specifically, a string first needs to be properly escaped, then
> this passed to the regex engine.
> 
> For example, suppose you have this text “Sin[x] + Sin[y]” and you need
> to capture the x or y.
> 
> In emacs i need to use
> “\\(\\[[a-z]\\]\\)”

If all you want to capture is the x or y (without the square brackets):

	"\\[\\([a-z]\\)\\]"

> for the actual regex
> “\(\[[a-z]\]\)”.

The enclosing double quotes are misleading in this context.  I would
simply write (again, capturing the letter but not the brackets):

	\[\([a-z]\)\]

Could you show the corresponding syntax in Perl or Java, as both a
conceptual (unquoted) regular expression and as a string literal (for
comparison)?

> Here's somewhat typical but long regex for matching a html image tag
> 
> (search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
> \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)
> 
> The toothpick syndrom gets crazy making already difficult regex syntax
> impossible to read and hard to code.

One of the reasons Emacs regular expressions are hard-to-read in this
way is that parentheses are defined as normal characters that need to be
escaped when they are to be interpreted as grouping delimiters, whereas
other languages interpret parentheses the opposite (as metacharacters
that need to be escaped to be matched literally).

> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?

It is due to the distinction between a string and the syntax for
representing it in a program, and the interpretation of the characters
in a string itself (vs. its surface representation) as a regular
expression.

This is just like writing a shell command (using double quotes around
the regular expression) that calls the grep program (which never "sees"
the quotes).

> Second question: can't elisp create some like “regex-string” wrapper
> function that automatically takes care of the quoting? I can't see how
> this migth be difficult?

All you need to do is specify a regular expression syntax and a string
literal syntax that don't define meanings for the same character (here:
backslash).

-- 
Kevin Rodgers
Denver, Colorado, USA