unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: martin rudalics <rudalics@gmx.at>
Cc: schwab@suse.de, rms@gnu.org, emacs-devel@gnu.org
Subject: Re: Unquoted special characters in regexps
Date: Wed, 01 Mar 2006 14:00:05 +0100	[thread overview]
Message-ID: <44059AD5.4030602@gmx.at> (raw)
In-Reply-To: <200602282257.k1SMvqU26046@raven.dms.auburn.edu>

 > Martin Rudalics wrote:
 >
 >    It would be strange to say, for example, that the double-quote
 >    opening an Elisp string is outside the context of the string and
 >    the double-quote that closes it inside.
 >
 > I do not see why you consider this strange.  Quite to the contrary,
 > this is exactly what allows one to determine whether a `"' opens or
 > closes a string.  `"" is special both inside and outside the context
 > of a string.  But its special meaning depends on that context.
 > Outside the context of a string `"' starts a string, inside the
 > context of a string, `"' ends a string.  So an opening `"' is opening
 > _because_ it occurs outside of a string context and the closing `"' is
 > the closing one _because_ it occurs inside a string context.
 >
 > Note that the GNU regexp manual, node `(regex)List Operators' agrees
 > with Andreas and me that `[' is special _outside_ a character alternative
 > (by stating that it is ordinary inside one) and explicitly states that
 > `]' has the special meaning of closing a character alternative
 > _inside_ a character alternative.  (Note that it refers to character
 > alternatives as "lists".)

If you refer to section "3.6 List Operators ([ ... ] and [^ ... ])" of
the GNU regex manual I can exctract three relevant sentences:

"A matching list matches a single character represented by one of the
list items. You form a matching list by enclosing one or more items
within an open-matching-list operator (represented by `[') and a
close-list operator (represented by `]')."

If you deduce here that the "close-list operator" is part of the "items
within" you can deduce that the "open-matching-list" operator is part of
the "items within" as well.

"`]' ends the list if it's not the first list item. So, if you want to
make the `]' character a list item, you must put it first."

`]' is special inside a chararacter list - the "items within" mentioned
above - because it has to appear as the first element of that list.

"`-' represents the range operator (see section 3.6.2 The Range Operator
(-)) if it's not first or last in a list or the ending point of a range."

If `-' can be "last in a list" the close-list operator `]' cannot be
"last in that list".  Ex falso sequitur quodlibet.


If anyone's interested in how other languages handle regexp brackets
see the list below:

Perl's metacharacters are:
     { } [ ] ( ) ^ $ . | * + ? \

Python metacharacters are:
     . ^ $ * + ? { [ ] \ | ( )

PHP:
     Outside square brackets, the meta-characters are as follows:
     ...
     [ start character class definition
     ] end character class definition
     ...

XML:
     A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ].

Tcl:
     A regular expression uses metacharacters (characters that assume special
     meaning for matching other characters) such as *, [], $ and ..
     ...
     A backslash (\) disables the special meaning of the following character,
     so you could match the string [Hello] with the RE \[Hello\].

Java (http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html):
     Perl is forgiving about malformed matching constructs, as in the
     expression *a, as well as dangling brackets, as in the expression
     abc], and treats them as literals.

     Java also accepts dangling brackets but is strict about dangling
     metacharacters like +, ? and *, and will throw a
     PatternSyntaxException if it encounters them.

Hence all classic regexp languages do consider `]' special and do not
consider `-' special.  The Java doc calls the `]' in `abc]' a dangling
bracket.  The fact that languages "forgive" or "accept" such constructs
shouldn't cause anyone to promote such style.

  reply	other threads:[~2006-03-01 13:00 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-25 17:23 Unquoted special characters in regexps martin rudalics
2006-02-25 18:42 ` Andreas Schwab
2006-02-25 19:18   ` martin rudalics
2006-02-25 19:31     ` Andreas Schwab
2006-02-25 20:18       ` martin rudalics
2006-02-25 22:09         ` Andreas Schwab
2006-02-26 11:32           ` martin rudalics
2006-02-26 11:50             ` Andreas Schwab
2006-02-26 13:28               ` martin rudalics
2006-02-25 22:13         ` Luc Teirlinck
2006-02-26 13:13           ` martin rudalics
2006-02-26 13:50             ` Andreas Schwab
2006-02-26 16:41               ` Luc Teirlinck
2006-02-26 17:53                 ` martin rudalics
2006-02-26 18:22                   ` Luc Teirlinck
2006-02-26 19:26                     ` martin rudalics
2006-02-26 17:10               ` martin rudalics
2006-02-26 17:42                 ` Luc Teirlinck
2006-02-26 19:06                   ` martin rudalics
2006-02-26 17:56                 ` Andreas Schwab
2006-02-26 19:08                   ` martin rudalics
2006-02-27 19:03                     ` Richard Stallman
2006-02-27 19:36                       ` Andreas Schwab
2006-02-27 20:03                         ` martin rudalics
2006-02-27 20:32                           ` Andreas Schwab
2006-02-27 21:43                             ` martin rudalics
2006-02-27 22:11                               ` Andreas Schwab
2006-02-28  6:19                                 ` Richard Stallman
2006-02-28 10:28                                 ` martin rudalics
2006-02-28  0:30                       ` Luc Teirlinck
2006-02-28 10:27                         ` martin rudalics
2006-02-28 22:57                           ` Luc Teirlinck
2006-03-01 13:00                             ` martin rudalics [this message]
2006-03-01 17:54                         ` Richard Stallman
2006-03-02  4:06                           ` Luc Teirlinck
2006-03-02 19:43                             ` Richard Stallman
2006-03-02  4:54                           ` Luc Teirlinck
2006-03-02 18:40                           ` martin rudalics
2006-03-02 23:26                             ` Luc Teirlinck
2006-03-03  7:42                               ` martin rudalics
2006-03-03 13:51                                 ` Luc Teirlinck
2006-03-03 14:09                                 ` Luc Teirlinck
2006-03-03 18:52                                   ` martin rudalics
2006-03-03 22:41                                     ` Luc Teirlinck
2006-03-03 23:00                                     ` Luc Teirlinck
2006-03-03 10:25                             ` Richard Stallman
2006-03-03 15:20                               ` martin rudalics
2006-03-04 13:37                                 ` Richard Stallman
2006-03-04 14:40                                   ` martin rudalics
2006-03-06  0:48                                     ` Richard Stallman
2006-03-03 10:25                             ` Richard Stallman
2006-03-03 15:51                               ` martin rudalics
2006-03-03 23:48                                 ` Luc Teirlinck
2006-03-04  9:58                                   ` martin rudalics
2006-03-04 23:16                                 ` Luc Teirlinck
2006-03-05  2:54                               ` Luc Teirlinck
2006-03-06  0:49                                 ` Richard Stallman
2006-02-28  0:44                       ` Luc Teirlinck
2006-03-04 21:07                         ` Thien-Thi Nguyen
2006-03-05  3:37                           ` Luc Teirlinck
2006-03-05 11:10                             ` martin rudalics
2006-03-05 15:32                               ` Luc Teirlinck
2006-03-06  7:41                                 ` martin rudalics
2006-03-05 17:04                               ` Luc Teirlinck
2006-03-05 11:54                             ` martin rudalics
2006-03-05 15:35                               ` Andreas Schwab
2006-03-06  8:19                                 ` martin rudalics
2006-03-05 18:36                               ` Luc Teirlinck
2006-03-05 19:14                                 ` Luc Teirlinck
2006-03-06  8:17                                   ` martin rudalics
2006-02-28  0:59                       ` Luc Teirlinck
2006-03-06 12:52                         ` Richard Stallman
2006-03-07  5:52                           ` Luc Teirlinck
2006-03-07  8:53                             ` martin rudalics
2006-02-25 22:34         ` Luc Teirlinck
2006-02-25 22:59           ` Andreas Schwab
2006-02-26 13:20           ` martin rudalics
2006-02-26 16:53             ` Luc Teirlinck
2006-02-26 18:01               ` martin rudalics
2006-02-26 17:19             ` Luc Teirlinck
2006-02-26 18:13               ` martin rudalics

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44059AD5.4030602@gmx.at \
    --to=rudalics@gmx.at \
    --cc=emacs-devel@gnu.org \
    --cc=rms@gnu.org \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).