Re: regex and case-fold-search problem

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Richard Stallman <rms@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: regex and case-fold-search problem
Date: Fri, 30 Aug 2002 15:19:14 -0400	[thread overview]
Message-ID: <E17krIY-0004ow-00@fencepost.gnu.org> (raw)
In-Reply-To: <200208290853.RAA03185@etlken.m17n.org> (message from Kenichi Handa on Thu, 29 Aug 2002 17:53:53 +0900 (JST))

    So, I agree with Stephen that his method is good enough.

It is wrong even for ASCII--we definitely must do something better, at
least for ASCII.  The only question is, how much more than ASCII?

    I think we all know that is the right behaviour, and at
    least for ASCII, the latest code works as that.  Perhpas, we
    should make Emacs work correctly also for Latin-1 chars,
    because in emacs-unicode also, they have the same code
    order.

What about for Latin-2 characters?  Will those regexp ranges
change their meaning in emacs-unicode?

If so, perhaps we only need to make an effort to support ranges really
right for codes 0-256.

    > A faster way, in the usual cases, would be to look for the case where
    > several consecutive characters that have just one case-sibling each,
    > and the siblings are consecutive too.  Each subrange of this kind can
    > be turned into two subranges, the original and the case-converted.
    > Also identify subranges of characters that have no case-siblings; each
    > subrange of this kind just remains as it is.  Finally, any unusual
    > characters that are encountered can be replaced with a list of all the
    > case-siblings.

    > This too requires use of the whole case table.

    Implemnting that for any range of characters consumes our
    man-power and makes the running code slower.

It is not a very hard program to write, I think.  I'd guess around 30
lines.  However, you're right about the slowness for large ranges.  If
we only do this for codes 0-256 (or, currently, for ASCII and
Latin-1), then it won't be too slow.

    Consider the situation that one writes this regexp
	    "[\000-\xffff]"
    to search only Unicode BMP chars in emacs-unicode.

Do you think that is a reasonable kind of range that we
should try to support?  If so, there goes my idea that
we only need to support ranges in 0-256 very well.

On the other hand, if we handle \000-\xffff by doing case conversion
carefully only for ASCII and Latin-1, and treat the rest of the range
in a less smart way, we would get the same results in this case.
Is that a good solution?

next prev parent reply	other threads:[~2002-08-30 19:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-23  6:25 regex and case-fold-search problem Kenichi Handa
2002-08-23 15:56 ` Eli Zaretskii
2002-08-24  0:51   ` Kenichi Handa
2002-08-24  1:03     ` Miles Bader
2002-08-24  9:42       ` Eli Zaretskii
2002-08-24 16:16       ` Andreas Schwab
2002-08-26  1:54         ` Miles Bader
2002-08-26 16:11           ` Stefan Monnier
2002-08-26 21:51         ` Richard Stallman
2002-08-24  9:39     ` Eli Zaretskii
2002-08-26  1:29       ` Kenichi Handa
2002-08-26  2:31         ` Miles Bader
2002-08-25 22:21     ` Kim F. Storm
2002-08-23 17:36 ` Stefan Monnier
2002-08-23 21:52   ` Stefan Monnier
2002-08-24  1:16   ` Kenichi Handa
2002-08-25 18:52     ` Stefan Monnier
2002-08-26  1:56       ` Kenichi Handa
2002-08-24 10:40   ` Kai Großjohann
2002-08-26 21:51 ` Richard Stallman
2002-08-29  8:53   ` Kenichi Handa
2002-08-29 12:33     ` Kim F. Storm
2002-08-29 13:38       ` Kenichi Handa
2002-08-29 15:00         ` Kim F. Storm
2002-08-29 16:00         ` Stefan Monnier
2002-08-30  1:11           ` Kenichi Handa
2002-08-30 19:19             ` Richard Stallman
2002-08-30 19:19     ` Richard Stallman [this message]
2002-08-30 20:08       ` Stefan Monnier
2002-09-01 13:15         ` Richard Stallman
2002-09-01 16:26           ` Stefan Monnier
2002-09-02 14:54             ` Richard Stallman
2002-09-02 16:58               ` Stefan Monnier
2002-09-04 14:13                 ` Richard Stallman
2002-09-04 16:04                   ` Stefan Monnier
2002-09-05 18:02                     ` Richard Stallman
2002-09-06  1:00                       ` re-search-forward seems to be broken Miles Bader
2002-09-06 20:03                         ` Richard Stallman
2002-08-31  6:14       ` regex and case-fold-search problem Eli Zaretskii
2002-09-01 13:14         ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E17krIY-0004ow-00@fencepost.gnu.org \
    --to=rms@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.