From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Derick Eddington <derick.eddington@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Regular expressions for Unicode general categories
Date: Sun, 07 Dec 2008 12:47:13 -0800
Message-ID: <1228682833.4393.35.camel@eep>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1228683562 10163 80.91.229.12 (7 Dec 2008 20:59:22 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 7 Dec 2008 20:59:22 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Dec 07 22:00:26 2008
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1L9Qjg-0004g0-Nb
	for geh-help-gnu-emacs@m.gmane.org; Sun, 07 Dec 2008 22:00:16 +0100
Original-Received: from localhost ([127.0.0.1]:40835 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1L9QiV-0006La-N6
	for geh-help-gnu-emacs@m.gmane.org; Sun, 07 Dec 2008 15:59:03 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L9QXC-0005fM-Cx
	for help-gnu-emacs@gnu.org; Sun, 07 Dec 2008 15:47:22 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L9QXA-0005ch-Gt
	for help-gnu-emacs@gnu.org; Sun, 07 Dec 2008 15:47:21 -0500
Original-Received: from [199.232.76.173] (port=39081 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L9QXA-0005c3-Aj
	for help-gnu-emacs@gnu.org; Sun, 07 Dec 2008 15:47:20 -0500
Original-Received: from rv-out-0506.google.com ([209.85.198.226]:59275)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <derick.eddington@gmail.com>) id 1L9QX8-0003qE-M9
	for help-gnu-emacs@gnu.org; Sun, 07 Dec 2008 15:47:19 -0500
Original-Received: by rv-out-0506.google.com with SMTP id g9so799246rvb.0
	for <help-gnu-emacs@gnu.org>; Sun, 07 Dec 2008 12:47:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:subject:from:to:content-type
	:date:message-id:mime-version:x-mailer:content-transfer-encoding;
	bh=HQL8lxSDV+GF1dZxU+MgrbfRuUfR7iDyYxY+06Qhn0E=;
	b=c9O41NnbGtHXSJlvqy7UlEmxg303e+abEVYwbVeD+kYo17CU0cJUgmYIzFw2XWK5FW
	DUiZoAYfKhiYj9vgVjlD7D05bIDDPphbRHOEGxZ5We2+GPiXZFzgVaZO+euz4nR4jcsG
	gvSAs5fQv1jpy1fDbxugTXxIhV5uIrQkxATzI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:from:to:content-type:date:message-id:mime-version:x-mailer
	:content-transfer-encoding;
	b=DpyR1SvoBs6CPp7S362XZHJ86uzR63GeV2aaAPLwbljCp6ykhtBd8bApHEJa0De2mu
	QWqJDcbGDRp72UicoTyziSA4xvOAWxo/RnNOW+Qg4LiTfw3cq/kH4Y2SX41keUpKOJ/M
	GvX8k8JxFsfcQzftaMO8XEYF7MYkNPFoT9AQ4=
Original-Received: by 10.141.175.5 with SMTP id c5mr1260496rvp.243.1228682837767;
	Sun, 07 Dec 2008 12:47:17 -0800 (PST)
Original-Received: from ?192.168.1.2? (pool-173-51-86-88.lsanca.fios.verizon.net
	[173.51.86.88])
	by mx.google.com with ESMTPS id b39sm11735509rvf.0.2008.12.07.12.47.15
	(version=SSLv3 cipher=RC4-MD5); Sun, 07 Dec 2008 12:47:16 -0800 (PST)
X-Mailer: Evolution 2.24.2 
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Mailman-Approved-At: Sun, 07 Dec 2008 15:58:36 -0500
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:60430
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/60430>

Hello,

I am making an Emacs regular expression for matching R6RS Scheme
"identifiers" (part of the syntax highlighting of a major mode I'm
making), and it needs to match characters based on their Unicode
general categories.  It seems Emacs regular expressions do not provide
a way to do that directly (I'm using Emacs 23.0.60.1) (I couldn't find
anything about this in the Info docs, emacswiki.org, or this list's
archives), so I computed regular expression character sets for the
needed general categories (using `get-char-code-property') and placed
these in their positions in the larger regular expression.

My problem is I can't use it because I get this error: 
  Error during redisplay: (invalid-regexp Regular expression too big) 
which is understandable because the general category character sets
are giant and a bunch of them are used, and I suspect they might have
been too inefficient anyways.

So, what can I do?  If Emacs regular expressions' backslash construct
`\cC' supported Unicode general categories, or if there was some
construct which did, I think that would do it nicely.  Is that
planned, or should I resort to doing more manual parsing, or something
else?

JTMI, the reason identifiers need to be recognized using their
complete lexical specification is because I'm also highlighting
numbers and they have a lexical syntax which overlaps with
identifiers and so identifiers need to be fontified first just so
they're not partially fontified as numbers.

Thank you for help,

-- 
: Derick
----------------------------------------------------------------