From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Fri, 03 Mar 2006 08:42:01 +0100 Message-ID: <4407F349.8090202@gmx.at> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> <200602252213.k1PMDBP24413@raven.dms.auburn.edu> <4401A98D.3070809@gmx.at> <4401E0F2.7030800@gmx.at> <4401FCBA.1070206@gmx.at> <200602280030.k1S0UDE07149@raven.dms.auburn.edu> <44073C08.1070903@gmx.at> <200603022326.k22NQhZ05807@raven.dms.auburn.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1141444532 9331 80.91.229.2 (4 Mar 2006 03:55:32 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 4 Mar 2006 03:55:32 +0000 (UTC) Cc: schwab@suse.de, rms@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 04 04:55:31 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FFNrc-0007BU-2n for ged-emacs-devel@m.gmane.org; Sat, 04 Mar 2006 04:55:28 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFNrd-0001xA-HY for ged-emacs-devel@m.gmane.org; Fri, 03 Mar 2006 22:55:29 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FFNls-0006mA-CQ for emacs-devel@gnu.org; Fri, 03 Mar 2006 22:49:32 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FFNlo-0006kF-Q2 for emacs-devel@gnu.org; Fri, 03 Mar 2006 22:49:31 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFEav-0006b1-4h for emacs-devel@gnu.org; Fri, 03 Mar 2006 13:01:37 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FF4yq-00062z-E6 for emacs-devel@gnu.org; Fri, 03 Mar 2006 02:45:40 -0500 Original-Received: (qmail invoked by alias); 03 Mar 2006 07:43:56 -0000 Original-Received: from N754P024.adsl.highway.telekom.at (EHLO [62.47.38.56]) [62.47.38.56] by mail.gmx.net (mp019) with SMTP; 03 Mar 2006 08:43:56 +0100 X-Authenticated: #14592706 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en Original-To: Luc Teirlinck In-Reply-To: <200603022326.k22NQhZ05807@raven.dms.auburn.edu> X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51128 Archived-At: > I do not see what this problem has to do with "\\]" vs ']'. > > This seems to be just a case of forgetting to double up `\' for Lisp > syntax. That's precisely what I meant. If programmers consistently double up backslashes for _all_ escaped brackets it's usually simple to guess when one of them has been omitted. Otherwise you always have to consider the possibility that the author wanted to close a character alternative here and messed up some preceding part. You have a long-standing experience (or maybe some sixth sense) for discovering wrong regexps faster than most of us. But you should occasionally think of less experienced programmers who try to guess the motivations for writing an expression like (string-match "[^\\]\\(\\([\\][\\]\\)*\\)\"[ \t,]*" definition start) in `mailalias.el'. It's got no less than three backslashes preceding non-escaped right brackets. Can you tell me what the author wants to match? If, by default, I have to consider the possibility that a `]' may either close a character alternative _or_ stand for itself, the number of interpretations of such expressions explodes combinatorially. Programmers should avoid confusion by not putting `\\' at the end of a character alternative unless its needed as in `[^\\]'. > The present regexp is valid, but the syntax it is looking for seems > bizarre. On the other hand looking for things like: > > "[123] [5] [2034] " > > seems to make sense. Because people are used to consider objects like "[123] [5] [2034]" well-formed and objects like "123]", "]5]", "[2034 " bizarre. Most humans _do_ expect to find some sort of symmetry in the things they observe. Symmetry is a driving principle of mathematics and computer sciences. Often, it's a lack of symmetry that makes people aware of faults or other anomalies.