From: Richard Stallman <rms@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: regex and case-fold-search problem
Date: Fri, 30 Aug 2002 15:19:14 -0400 [thread overview]
Message-ID: <E17krIY-0004ow-00@fencepost.gnu.org> (raw)
In-Reply-To: <200208290853.RAA03185@etlken.m17n.org> (message from Kenichi Handa on Thu, 29 Aug 2002 17:53:53 +0900 (JST))
So, I agree with Stephen that his method is good enough.
It is wrong even for ASCII--we definitely must do something better, at
least for ASCII. The only question is, how much more than ASCII?
I think we all know that is the right behaviour, and at
least for ASCII, the latest code works as that. Perhpas, we
should make Emacs work correctly also for Latin-1 chars,
because in emacs-unicode also, they have the same code
order.
What about for Latin-2 characters? Will those regexp ranges
change their meaning in emacs-unicode?
If so, perhaps we only need to make an effort to support ranges really
right for codes 0-256.
> A faster way, in the usual cases, would be to look for the case where
> several consecutive characters that have just one case-sibling each,
> and the siblings are consecutive too. Each subrange of this kind can
> be turned into two subranges, the original and the case-converted.
> Also identify subranges of characters that have no case-siblings; each
> subrange of this kind just remains as it is. Finally, any unusual
> characters that are encountered can be replaced with a list of all the
> case-siblings.
> This too requires use of the whole case table.
Implemnting that for any range of characters consumes our
man-power and makes the running code slower.
It is not a very hard program to write, I think. I'd guess around 30
lines. However, you're right about the slowness for large ranges. If
we only do this for codes 0-256 (or, currently, for ASCII and
Latin-1), then it won't be too slow.
Consider the situation that one writes this regexp
"[\000-\xffff]"
to search only Unicode BMP chars in emacs-unicode.
Do you think that is a reasonable kind of range that we
should try to support? If so, there goes my idea that
we only need to support ranges in 0-256 very well.
On the other hand, if we handle \000-\xffff by doing case conversion
carefully only for ASCII and Latin-1, and treat the rest of the range
in a less smart way, we would get the same results in this case.
Is that a good solution?
next prev parent reply other threads:[~2002-08-30 19:19 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-08-23 6:25 regex and case-fold-search problem Kenichi Handa
2002-08-23 15:56 ` Eli Zaretskii
2002-08-24 0:51 ` Kenichi Handa
2002-08-24 1:03 ` Miles Bader
2002-08-24 9:42 ` Eli Zaretskii
2002-08-24 16:16 ` Andreas Schwab
2002-08-26 1:54 ` Miles Bader
2002-08-26 16:11 ` Stefan Monnier
2002-08-26 21:51 ` Richard Stallman
2002-08-24 9:39 ` Eli Zaretskii
2002-08-26 1:29 ` Kenichi Handa
2002-08-26 2:31 ` Miles Bader
2002-08-25 22:21 ` Kim F. Storm
2002-08-23 17:36 ` Stefan Monnier
2002-08-23 21:52 ` Stefan Monnier
2002-08-24 1:16 ` Kenichi Handa
2002-08-25 18:52 ` Stefan Monnier
2002-08-26 1:56 ` Kenichi Handa
2002-08-24 10:40 ` Kai Großjohann
2002-08-26 21:51 ` Richard Stallman
2002-08-29 8:53 ` Kenichi Handa
2002-08-29 12:33 ` Kim F. Storm
2002-08-29 13:38 ` Kenichi Handa
2002-08-29 15:00 ` Kim F. Storm
2002-08-29 16:00 ` Stefan Monnier
2002-08-30 1:11 ` Kenichi Handa
2002-08-30 19:19 ` Richard Stallman
2002-08-30 19:19 ` Richard Stallman [this message]
2002-08-30 20:08 ` Stefan Monnier
2002-09-01 13:15 ` Richard Stallman
2002-09-01 16:26 ` Stefan Monnier
2002-09-02 14:54 ` Richard Stallman
2002-09-02 16:58 ` Stefan Monnier
2002-09-04 14:13 ` Richard Stallman
2002-09-04 16:04 ` Stefan Monnier
2002-09-05 18:02 ` Richard Stallman
2002-09-06 1:00 ` re-search-forward seems to be broken Miles Bader
2002-09-06 20:03 ` Richard Stallman
2002-08-31 6:14 ` regex and case-fold-search problem Eli Zaretskii
2002-09-01 13:14 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E17krIY-0004ow-00@fencepost.gnu.org \
--to=rms@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.