From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Luc Teirlinck Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Sun, 5 Mar 2006 09:32:27 -0600 (CST) Message-ID: <200603051532.k25FWR011484@raven.dms.auburn.edu> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> <200602252213.k1PMDBP24413@raven.dms.auburn.edu> <4401A98D.3070809@gmx.at> <4401E0F2.7030800@gmx.at> <4401FCBA.1070206@gmx.at> <200602280044.k1S0iHG07279@raven.dms.auburn.edu> <200603050337.k253brP03395@raven.dms.auburn.edu> <440AC722.7020009@gmx.at> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1141573075 13456 80.91.229.2 (5 Mar 2006 15:37:55 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 5 Mar 2006 15:37:55 +0000 (UTC) Cc: ttn@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 05 16:37:53 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FFvIs-0000XY-Lb for ged-emacs-devel@m.gmane.org; Sun, 05 Mar 2006 16:37:51 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFvIz-0001xN-I5 for ged-emacs-devel@m.gmane.org; Sun, 05 Mar 2006 10:37:57 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FFvIg-0001Ij-U8 for emacs-devel@gnu.org; Sun, 05 Mar 2006 10:37:38 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FFvId-000122-BR for emacs-devel@gnu.org; Sun, 05 Mar 2006 10:37:38 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFvId-00010s-4E for emacs-devel@gnu.org; Sun, 05 Mar 2006 10:37:35 -0500 Original-Received: from [131.204.53.104] (helo=manatee.dms.auburn.edu) by monty-python.gnu.org with esmtp (Exim 4.52) id 1FFvKd-00047f-Kp; Sun, 05 Mar 2006 10:39:39 -0500 Original-Received: from raven.dms.auburn.edu (raven.dms.auburn.edu [131.204.53.29]) by manatee.dms.auburn.edu (8.13.3+Sun/8.13.3) with ESMTP id k25FbLiZ013612; Sun, 5 Mar 2006 09:37:21 -0600 (CST) Original-Received: (from teirllm@localhost) by raven.dms.auburn.edu (8.11.7p1+Sun/8.11.7) id k25FWR011484; Sun, 5 Mar 2006 09:32:27 -0600 (CST) X-Authentication-Warning: raven.dms.auburn.edu: teirllm set sender to teirllm@dms.auburn.edu using -f Original-To: rudalics@gmx.at In-reply-to: <440AC722.7020009@gmx.at> (message from martin rudalics on Sun, 05 Mar 2006 12:10:26 +0100) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.1 (manatee.dms.auburn.edu [131.204.53.104]); Sun, 05 Mar 2006 09:37:21 -0600 (CST) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51233 Archived-At: Martin Rudalics wrote: Luc Teirlinck wrote: > If you consider in "[a]b]" the first and the second `]' to be _both_ > inside or _both_ outside the context of a character alternative, then > it would be impossible to determine solely from that notion of context > which of the two `]' has to be taken literally. That's what I don't get tired of saying for one week already. You always denied it by saying things like The special meaning of `]' inside a character alternative is obviously to close that alternative. and `]' has the special meaning of closing a character alternative _inside_ a character alternative Look, I am getting tired of this endless yes-no discussion. But you have completely misunderstood everything I have been saying. Let me try once more to explain. Figuring out whether a `]' has to be taken literally or not is a completely trivial problem, but you are making it difficult on yourself for counterproductive philosophical reasons. Start at the beginning of the regexp. `[' is special, `]' not, because we are outside a character alternative. After the first unquoted `[' is read, which is special because it was typed outside a character alternative, we are inside a character alternative. `[' is no longer special, but `]' is (except immediately after the `[' or "[^"), because we now are inside a character alternative. After the next `]' is read, which is special because it was typed inside a character alternative, we are back outside a character alternative, `[' is special, `]' not. To summarize, `]' is only special in a character alternative, `[' is only special outside one. Note how easy this is. Unlike for, say \\( you do not even have to keep track of which `[' matches which `]', because there is no nesting. All you need to keep track of is whether you are inside or outside a character alternative. You are making things difficult by treating `[' and `]' in regexps as if they had the usual open-close parentheses syntax, like \\( and \\). They do *not* and that is the cause of all your misunderstandings. In "[1[2]3]" the first `]' closes the first `[' and "balance" makes no sense for the other `[' and `]'. If `[' and `]' had the usual open-close parentheses syntax, the 2 would be inside a nested character alternative, two levels deep. But there is no such thing as nested character alternatives, because, in regexps, `[' and `]' do not have the usual open-close parentheses syntax (unlike, say, in Lisp vectors). Sincerely, Luc.