unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Regular expressions for Unicode general categories
@ 2008-12-07 20:47 Derick Eddington
  2008-12-07 23:35 ` Peter Dyballa
  0 siblings, 1 reply; 3+ messages in thread
From: Derick Eddington @ 2008-12-07 20:47 UTC (permalink / raw)
  To: help-gnu-emacs

Hello,

I am making an Emacs regular expression for matching R6RS Scheme
"identifiers" (part of the syntax highlighting of a major mode I'm
making), and it needs to match characters based on their Unicode
general categories.  It seems Emacs regular expressions do not provide
a way to do that directly (I'm using Emacs 23.0.60.1) (I couldn't find
anything about this in the Info docs, emacswiki.org, or this list's
archives), so I computed regular expression character sets for the
needed general categories (using `get-char-code-property') and placed
these in their positions in the larger regular expression.

My problem is I can't use it because I get this error: 
  Error during redisplay: (invalid-regexp Regular expression too big) 
which is understandable because the general category character sets
are giant and a bunch of them are used, and I suspect they might have
been too inefficient anyways.

So, what can I do?  If Emacs regular expressions' backslash construct
`\cC' supported Unicode general categories, or if there was some
construct which did, I think that would do it nicely.  Is that
planned, or should I resort to doing more manual parsing, or something
else?

JTMI, the reason identifiers need to be recognized using their
complete lexical specification is because I'm also highlighting
numbers and they have a lexical syntax which overlaps with
identifiers and so identifiers need to be fontified first just so
they're not partially fontified as numbers.

Thank you for help,

-- 
: Derick
----------------------------------------------------------------






^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-12-08  0:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-07 20:47 Regular expressions for Unicode general categories Derick Eddington
2008-12-07 23:35 ` Peter Dyballa
2008-12-08  0:49   ` Derick Eddington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).