From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Tue, 28 Feb 2006 11:27:01 +0100 Message-ID: <44042575.8080806@gmx.at> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> <200602252213.k1PMDBP24413@raven.dms.auburn.edu> <4401A98D.3070809@gmx.at> <4401E0F2.7030800@gmx.at> <4401FCBA.1070206@gmx.at> <200602280030.k1S0UDE07149@raven.dms.auburn.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1141486931 23071 80.91.229.2 (4 Mar 2006 15:42:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 4 Mar 2006 15:42:11 +0000 (UTC) Cc: schwab@suse.de, rms@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 04 16:42:10 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FFYtV-00086g-Lw for ged-emacs-devel@m.gmane.org; Sat, 04 Mar 2006 16:42:10 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFYtZ-00047K-8z for ged-emacs-devel@m.gmane.org; Sat, 04 Mar 2006 10:42:13 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FFX1a-0005Oi-8g for emacs-devel@gnu.org; Sat, 04 Mar 2006 08:42:22 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FFX1S-0005I7-SX for emacs-devel@gnu.org; Sat, 04 Mar 2006 08:42:17 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FEAbU-0005ed-RX for emacs-devel@gnu.org; Tue, 28 Feb 2006 14:33:48 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FE27M-0007kx-3u for emacs-devel@gnu.org; Tue, 28 Feb 2006 05:30:08 -0500 Original-Received: (qmail invoked by alias); 28 Feb 2006 10:29:04 -0000 Original-Received: from N880P024.adsl.highway.telekom.at (EHLO [62.47.53.248]) [62.47.53.248] by mail.gmx.net (mp032) with SMTP; 28 Feb 2006 11:29:04 +0100 X-Authenticated: #14592706 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en Original-To: Luc Teirlinck In-Reply-To: <200602280030.k1S0UDE07149@raven.dms.auburn.edu> X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51173 Archived-At: > `]', like `-' are only special in the context of a character > alternative, that is if, before you type them, you are in a character > alternative. By contrast, `[' and all other special characters > (except `^') are only special outside that context. You can talk about a context iff you are able to grammatically specify it. In order to talk about the contents of a string you must be able to determine the character sequences opening and closing strings. It would be strange to say, for example, that the double-quote opening an Elisp string is outside the context of the string and the double-quote that closes it inside. It would be strange to say that the bracket opening a character alternative is outside the context of the alternative and the closing bracket inside. > All characters that are special outside character alternatives are > never special if you precede them with a backslash. This is true even > for `^'. This is why it is good to precede them with a backslash even > if they are not special. That way, the reader can see that they are > not special, without studying the regexp. I agree. Let's try to read the following definition from `cc-fonts.el': (defconst autodoc-font-lock-doc-comments `(("@\\(\\w+{\\|\\[\\([^\]@\n\r]\\|@@\\)*\\]\\|[@}]\\|$\\)" ... It tells me that there are two character alternatives started by an unquoted `[' and terminated by an unquoted `]'. It also tells me that it's meant to match a bracketed expression as represented by `\\[' and `\\]' - I quickly exclude the possibility that the backslashes preceding any of these brackets are quoted backslashes in a character alternative. And, finally, the expression tells me that the author was probably uncertain about how to put a `]' inside a complemented character alternative, hence (s)he quoted it with a single backslash. In any case I have no difficulties reading the expression although I completely ignore its meaning. You propose to write (defconst autodoc-font-lock-doc-comments `(("@\\(\\w+{\\|\\[\\([^\]@\n\r]\\|@@\\)*]\\|[@}]\\|$\\)" ... instead. In that case, when I look at the character sequence `*]' I would have to consider the case that the `]' closes some character alternative. Only after I resolved that I would be able to say that the `]' should indeed match a right bracket. And I would still have to check whether the backslashes preceding the `\\[' are quoted backslashes in a character set. > First of all, there are (surprisingly) many occurrences of "\\]" in > the Emacs source, where the `]' _is_ special and closes a character > alternative that contains a slash. Reportedly quoting a `]' with a > backslash _inside_ a character alternative works in some other regexp > implementations such as AWK. So if I see "\\]" I have to worry about > three possibilities: it might deliberately close a character > alternative which includes a slash, it might do so by accident because > the author tried to quote a `]' inside a character alternative (and > hence the regexp is buggy), or it might be a deliberately quoted `]' > outside a character alternative. The Emacs manual clearly states that the backslash is not special in a character set. But I admit that users of other languages do have problems when writing Elisp regexps. That's why a clear and unambiguous definition of these concepts is important. > If I see `]' without preceding "\\", I only have to worry about > whether or not it closes a character alternative, and not about the > third possibility of a bug. When I try to read a regular expression I do not worry about the possibility of a bug in the first place. I try to understand what the author wanted to match. > There are places in the Emacs code that quote a `]' outside a > character alternative. Even if we decide that this is undesirable, I > do not fancy finding and changing them all. But we could change the > behavior of `regexp-quote' and `regexp-opt' which currently quote > such `]'. That could be done with the following trivial patch, which > I could install if that is what we decide to do: Given the amount of regular expressions users created with these functions and manually inserted in their code that would be confusing indeed.