From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Mon, 27 Feb 2006 18:30:13 -0600 (CST) Message-ID: <200602280030.k1S0UDE07149@raven.dms.auburn.edu> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> <200602252213.k1PMDBP24413@raven.dms.auburn.edu> <4401A98D.3070809@gmx.at> <4401E0F2.7030800@gmx.at> <4401FCBA.1070206@gmx.at> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1141305456 4940 80.91.229.2 (2 Mar 2006 13:17:36 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 2 Mar 2006 13:17:36 +0000 (UTC) Cc: rudalics@gmx.at, emacs-devel@gnu.org, schwab@suse.de Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Mar 02 14:17:35 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FEngI-0003tr-Ix for ged-emacs-devel@m.gmane.org; Thu, 02 Mar 2006 14:17:24 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FEngD-0002uu-Lx for ged-emacs-devel@m.gmane.org; Thu, 02 Mar 2006 08:17:18 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FEMAY-00084F-FI for emacs-devel@gnu.org; Wed, 01 Mar 2006 02:54:46 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FEMAX-00083T-Ba for emacs-devel@gnu.org; Wed, 01 Mar 2006 02:54:45 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FEAm6-0006rJ-CL for emacs-devel@gnu.org; Tue, 28 Feb 2006 14:44:46 -0500 Original-Received: from [131.204.53.104] (helo=manatee.dms.auburn.edu) by monty-python.gnu.org with esmtp (Exim 4.52) id 1FDsqK-000558-Sl; Mon, 27 Feb 2006 19:35:57 -0500 Original-Received: from raven.dms.auburn.edu (raven.dms.auburn.edu [131.204.53.29]) by manatee.dms.auburn.edu (8.13.3+Sun/8.13.3) with ESMTP id k1S0YpaI020013; Mon, 27 Feb 2006 18:34:52 -0600 (CST) Original-Received: (from teirllm@localhost) by raven.dms.auburn.edu (8.11.7p1+Sun/8.11.7) id k1S0UDE07149; Mon, 27 Feb 2006 18:30:13 -0600 (CST) X-Authentication-Warning: raven.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: rms@gnu.org In-reply-to: (message from Richard Stallman on Mon, 27 Feb 2006 14:03:42 -0500) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.1 (manatee.dms.auburn.edu [131.204.53.104]); Mon, 27 Feb 2006 18:34:52 -0600 (CST) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51054 Archived-At: None of the messages I sent on this (or on anything else) in the last few days made it to emacs-devel, although all other people's responses did, be it after some delay. I just got messages saying that local delivery failed. So I will have to repeat some things that I already said before. Richard Stallman wrote: However, that doesn't necessarily mean the manual is wrong. There is more than one way to understand the word "special". At the most literal level, ] is not special; if you write it without \\, the regexp compiler won't misunderstand it. `]', like `-' are only special in the context of a character alternative, that is if, before you type them, you are in a character alternative. By contrast, `[' and all other special characters (except `^') are only special outside that context. All characters that are special outside character alternatives are never special if you precede them with a backslash. This is true even for `^'. This is why it is good to precede them with a backslash even if they are not special. That way, the reader can see that they are not special, without studying the regexp. On the other hand, a backslash, _never_ eliminates the special meaning of a `]' or `-' with a special meaning. There are two questions here. Whether a `]' outside a character alternative should be quoted or not and whether any changes to the Elisp manual are required. In this posting, I will only discuss the first. First of all, there are (surprisingly) many occurrences of "\\]" in the Emacs source, where the `]' _is_ special and closes a character alternative that contains a slash. Reportedly quoting a `]' with a backslash _inside_ a character alternative works in some other regexp implementations such as AWK. So if I see "\\]" I have to worry about three possibilities: it might deliberately close a character alternative which includes a slash, it might do so by accident because the author tried to quote a `]' inside a character alternative (and hence the regexp is buggy), or it might be a deliberately quoted `]' outside a character alternative. If I see `]' without preceding "\\", I only have to worry about whether or not it closes a character alternative, and not about the third possibility of a bug. In summary I believe that quoting a `]' outside a character alternative only adds clutter and a third possibility to worry about. There are places in the Emacs code that quote a `]' outside a character alternative. Even if we decide that this is undesirable, I do not fancy finding and changing them all. But we could change the behavior of `regexp-quote' and `regexp-opt' which currently quote such `]'. That could be done with the following trivial patch, which I could install if that is what we decide to do: ===File ~/search.c-diff===================================== *** search.c 06 Feb 2006 16:02:24 -0600 1.206 --- search.c 27 Feb 2006 00:16:42 -0600 *************** *** 3066,3072 **** for (; in != end; in++) { ! if (*in == '[' || *in == ']' || *in == '*' || *in == '.' || *in == '\\' || *in == '?' || *in == '+' || *in == '^' || *in == '$') --- 3066,3072 ---- for (; in != end; in++) { ! if (*in == '[' || *in == '*' || *in == '.' || *in == '\\' || *in == '?' || *in == '+' || *in == '^' || *in == '$') ============================================================