unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Richard Stallman <rms@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: regex and case-fold-search problem
Date: Mon, 26 Aug 2002 15:51:41 -0600 (MDT)	[thread overview]
Message-ID: <200208262151.g7QLpfA12782@wijiji.santafe.edu> (raw)
In-Reply-To: <200208230625.PAA23426@etlken.m17n.org> (message from Kenichi Handa on Fri, 23 Aug 2002 15:25:42 +0900 (JST))

    In my opinion, specifying ranges by chars are nonsense
    because there should be no semantics in the order of
    characters codes.

The fact is, people know the character codes and take advantage of
their knowledge.  I don't think this is unreasonable.  But that
question is academic, since the feature is used and we need to make it
work.

    Does that happen because under case-fold-search non-nil the
    characters on the range specification are downcased?

It looks that way.

      Maybe we can simply use the smallest contiguous
    > range of chars that includes all the chars we should match,

That isn't right.  The range should be equal to the disjunction of all
characters in it; A-_ should be equivalent to []A.....Z[\^_].  With
case folding, that should match A-Z, a-z, and [\]^_.  In other words,
The correct behavior is that all character codes that are equivalent
(when you ignore case) to any character in the originally specified
range should match.

Given the whole case table, you can compute this by looping over the
original (non-case-folded) range and finding, for each character, all
the characters that are equivalent to it.  Then those could be
assembled into the smallest possible number of ranges.

A faster way, in the usual cases, would be to look for the case where
several consecutive characters that have just one case-sibling each,
and the siblings are consecutive too.  Each subrange of this kind can
be turned into two subranges, the original and the case-converted.
Also identify subranges of characters that have no case-siblings; each
subrange of this kind just remains as it is.  Finally, any unusual
characters that are encountered can be replaced with a list of all the
case-siblings.

This too requires use of the whole case table.

  parent reply	other threads:[~2002-08-26 21:51 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-23  6:25 regex and case-fold-search problem Kenichi Handa
2002-08-23 15:56 ` Eli Zaretskii
2002-08-24  0:51   ` Kenichi Handa
2002-08-24  1:03     ` Miles Bader
2002-08-24  9:42       ` Eli Zaretskii
2002-08-24 16:16       ` Andreas Schwab
2002-08-26  1:54         ` Miles Bader
2002-08-26 16:11           ` Stefan Monnier
2002-08-26 21:51         ` Richard Stallman
2002-08-24  9:39     ` Eli Zaretskii
2002-08-26  1:29       ` Kenichi Handa
2002-08-26  2:31         ` Miles Bader
2002-08-25 22:21     ` Kim F. Storm
2002-08-23 17:36 ` Stefan Monnier
2002-08-23 21:52   ` Stefan Monnier
2002-08-24  1:16   ` Kenichi Handa
2002-08-25 18:52     ` Stefan Monnier
2002-08-26  1:56       ` Kenichi Handa
2002-08-24 10:40   ` Kai Großjohann
2002-08-26 21:51 ` Richard Stallman [this message]
2002-08-29  8:53   ` Kenichi Handa
2002-08-29 12:33     ` Kim F. Storm
2002-08-29 13:38       ` Kenichi Handa
2002-08-29 15:00         ` Kim F. Storm
2002-08-29 16:00         ` Stefan Monnier
2002-08-30  1:11           ` Kenichi Handa
2002-08-30 19:19             ` Richard Stallman
2002-08-30 19:19     ` Richard Stallman
2002-08-30 20:08       ` Stefan Monnier
2002-09-01 13:15         ` Richard Stallman
2002-09-01 16:26           ` Stefan Monnier
2002-09-02 14:54             ` Richard Stallman
2002-09-02 16:58               ` Stefan Monnier
2002-09-04 14:13                 ` Richard Stallman
2002-09-04 16:04                   ` Stefan Monnier
2002-09-05 18:02                     ` Richard Stallman
2002-09-06  1:00                       ` re-search-forward seems to be broken Miles Bader
2002-09-06 20:03                         ` Richard Stallman
2002-08-31  6:14       ` regex and case-fold-search problem Eli Zaretskii
2002-09-01 13:14         ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200208262151.g7QLpfA12782@wijiji.santafe.edu \
    --to=rms@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).