all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Derick Eddington <derick.eddington@gmail.com>
To: bug-gnu-emacs@gnu.org
Subject: bug#1877: Request: Regular expressions that can match Unicode general categories
Date: Mon, 12 Jan 2009 12:38:12 -0800	[thread overview]
Message-ID: <1231792692.22467.115.camel@eep> (raw)

A new Scheme major mode I've made [1] requires regular expressions that
can match characters by their Unicode general categories.  It seems
Emacs regular expressions do not provide a way to do that directly (I'm
using GNU Emacs 23.0.60.1) (I couldn't find anything about it in the
Emacs documentation, emacswiki.org, or by asking on
help-gnu-emacs@gnu.org or in that list's archives).  So currently I
pre-compute character sets for the needed general categories (using
`get-char-code-property') and place these in their positions in the
larger regular expressions.  However, including character sets for every
general category I need makes the regular expressions too large for
Emacs and it errors trying to use them (some of them are pretty big); so
currently I'm not supporting all of them that are required.  Another
issue is these character sets are duplicated in different regular
expressions and since they're so large this causes code size bloat.
Another issue is I suspect matching character sets this large is not the
most time-efficient.

If Emacs regular expressions had some construct, similar to the existing
`\cC' one, that matched a character by its general category, I think
that would solve all the above issues nicely.  PLT Scheme regular
expressions have this ability [2].  

[1]
https://code.launchpad.net/~derick-eddington/scheme-mode/derick-.emacs.d
[2] http://docs.plt-scheme.org/reference/regexp.html

Thank you for your work on Emacs and for your time,

-- 
: Derick
----------------------------------------------------------------









             reply	other threads:[~2009-01-12 20:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-12 20:38 Derick Eddington [this message]
2019-09-30  7:45 ` bug#1877: Request: Regular expressions that can match Unicode general categories Lars Ingebrigtsen
2019-09-30  8:45   ` Eli Zaretskii
2021-11-14  6:28     ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1231792692.22467.115.camel@eep \
    --to=derick.eddington@gmail.com \
    --cc=1877@emacsbugs.donarmstrong.com \
    --cc=bug-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.