unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stephen Leake <stephen_leake@stephe-leake.org>
To: 1913@debbugs.gnu.org
Subject: bug#1913: Identifier after reserved word "raise" is not always
Date: Wed, 13 Jan 2010 03:03:24 -0500	[thread overview]
Message-ID: <ueiluo2ib.fsf@stephe-leake.org> (raw)

It is clear that [a-zA-Z] does not match the characters permitted by
the Ada standard.

However, neither does [[:alpha:]] - consider this fragment:

procedure doµ 

the 'µ' (entered by C-x 8 u) is not matched by [[:alpha:]]*
(Emacs 23.1, Windows XP, LANG=C.UTF-8).

This could be fixed by the user; they can define µ to have word
syntax.

Ideally, we would have regular expression character ranges that match
those defined by ISO/IEC 10646:2003 (see LRM 2.1); 

Letter, Uppercase
Letter, Lowercase
Letter, Titlecase
Letter, Modifier
Letter, Other
Mark, Non-Spacing
Mark, Spacing Combining
Number, Decimal
Number, Letter
Punctuation, Connector
Other, Format
Separator, Space
Separator, Line
Separator, Paragraph

These categories are used to define Ada lexical elements (LRM 2.2).

But I don't think that's going to happen.

It seems the best compromise is to replace a-z etc with [:alpha:] or
[:alnum:] as appropriate, and hope the user knows how to define
characters to have word syntax. That's a lot of work, since each
modified regexp needs to be tested.

As for matching leading underscores, I agree it would be nice to get
it right. Using shy groups (the elisp name for non-capturing groups)
would help, since it won't disturb the group numbering, as well as
being faster. If it doesn't complicate the testing, I'll try to do
that.

Do you have suggestions about which regular expressions are more
important to be fixed? If you can provide typical code, and point out
the most annoying font-lock failures, that would be a good start.

-- 
-- Stephe






             reply	other threads:[~2010-01-13  8:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-13  8:03 Stephen Leake [this message]
2011-07-09 23:24 ` bug#1913: Identifier after reserved word "raise" is not always Juanma Barranquero
2011-07-10 17:28   ` Stephen Leake
2011-07-10 23:12     ` Juanma Barranquero
2011-07-11 13:07       ` Stephen Leake
2011-07-12 12:19         ` Juanma Barranquero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ueiluo2ib.fsf@stephe-leake.org \
    --to=stephen_leake@stephe-leake.org \
    --cc=1913@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).