From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.devel Subject: Re: Unquoted special characters in regexps Date: Sun, 26 Feb 2006 12:32:18 +0100 Message-ID: <440191C2.4080609@gmx.at> References: <4400AD8E.5050001@gmx.at> <4400BBB1.2050800@gmx.at> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1140981439 9191 80.91.229.2 (26 Feb 2006 19:17:19 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 26 Feb 2006 19:17:19 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 26 20:17:17 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FDROI-0002tu-Ky for ged-emacs-devel@m.gmane.org; Sun, 26 Feb 2006 20:17:11 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FDROM-0006o9-Aw for ged-emacs-devel@m.gmane.org; Sun, 26 Feb 2006 14:17:15 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FDK7I-0003CK-RQ for emacs-devel@gnu.org; Sun, 26 Feb 2006 06:31:11 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FDK71-00037w-8h for emacs-devel@gnu.org; Sun, 26 Feb 2006 06:30:58 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FDK6x-00037B-3I for emacs-devel@gnu.org; Sun, 26 Feb 2006 06:30:47 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.52) id 1FDK7Q-0007IB-5v for emacs-devel@gnu.org; Sun, 26 Feb 2006 06:31:16 -0500 Original-Received: (qmail invoked by alias); 26 Feb 2006 11:30:39 -0000 Original-Received: from N736P027.adsl.highway.telekom.at (EHLO [62.47.35.251]) [62.47.35.251] by mail.gmx.net (mp032) with SMTP; 26 Feb 2006 12:30:39 +0100 X-Authenticated: #14592706 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: de-DE, de, en-us, en Original-To: Andreas Schwab In-Reply-To: X-Y-GMX-Trusted: 0 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:50988 Archived-At: >>>This is incorrect. ']' is only special in a bracket expression (like >>>'-'). >> >>`]' is _also_ special in a character alternative, > > > A bracket expression has a completely different set of special characters. > For example, '\' and '$' are not special there. > > >>`-' is special _only_ in a character alternative. > > > Just like ']'. > > Andreas. > We would have to agree on the semantics of the term "special" first. In Elisp descriptions this term is overloaded. Take, for example, the following excerpt from the Elisp tutorial: Indeed more than one such mark or brace may precede the space. These require a expression that looks like this: []\"')}]* In this expression, the first `]' is the first character in the expression; the second character is `"', which is preceded by a `\' to tell Emacs the `"' is _not_ special. The last three characters are `'', `)', and `}'. It's confusing because we know from the Elisp manual that The special characters are `.', `*', `+', `?', `[', `]', `^', `$', and `\'; no new special characters will be defined in the future. hence a double-quote is never "special" in terms of regexp semantics. Why should we have to tell Emacs that it is "_not_ special" then? The answer is, obviously, that the Elisp read syntax for regexps is the string data type and the tutorial's "special" indeed refers to string semantics. Hence when you say that "'\' and '$' are not special there" you probably don't mean the special semantics of the backslash within strings. Now let's agree on the term "there". Reasonably, "there" is the sequence of characters obtained after stripping both the opening bracket _and_ the closing bracket of a character alternative. Otherwise, the sentence from the Elisp manual "To include a `]' in a character alternative, you must make it the first character." wouldn't make sense. `]' is special inside a character alternative because it may appear in one and only one position - namely the first. And the semantics of the `]' in the first position is "match one `]'". The semantics of an `]' closing a character alternative is completely different from that. From an operational point of view - that of the Elisp interpreter - you _can_ say that `]' is not special in regexps. If that's the preferred point of view it's sufficient to remove `]' from the list of special characters in the respective manuals and treat it like `-' as you propose. And, you wouldn't have to mention "poor practice" in the Elisp manual at all - anything the Elisp interpreter interprets as intended by the programmer would be valid. There exists, however, a functional subset of Elisp (Elisp without setqs, iterators, ...) amenable to mathematical reasoning like proving correctness or validity of your code. And mathematicians don't like to reason about malformed constructs like "(a + 3" or "a + 3)". They prefer something like "a + 3" or "(a + 3)" instead. They do rely on the special semantics of "(" _and_ ")" within an expression. Hence, saying that `[' is special and `]' not would be tantamount to removing regexps from the subset of Elisp amenable to mathematical reasoning.