unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* regex and case-fold-search problem
@ 2002-08-23  6:25 Kenichi Handa
  2002-08-23 15:56 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Kenichi Handa @ 2002-08-23  6:25 UTC (permalink / raw)


While working on emacs-unicode, I noticed a very difficult
problem which also exists in the current emacs.

(let ((case-fold-search nil))
  (string-match "[Þ-ß]" "Þ")) => 0
(let ((case-fold-search nil))
  (string-match "[Þß]" "Þ")) => 0

(let ((case-fold-search t))
  (string-match "[Þ-ß]" "Þ")) => nil !!!
(let ((case-fold-search t))
  (string-match "[Þß]" "Þ")) => 0

When you see the output of M-x list-charset-chars RET
latin-iso8859-1 RET,  you'll soon find what's going on.

The relevan character codes are as follows:
	Þ (#x8DE)
	ß (#x8DF)
	(downcase ?Þ) == ?þ (#x8FE)
	(downcase ?ß) == ?ß (#x8DF)

This problem is not specific to non-ASCII chars, it's just
rarer to face such a sitution in ASCII chars.

(let ((case-fold-search nil))
  (string-match "[A-_]" "A")) => 0
(let ((case-fold-search t))
  (string-match "[A-_]" "A")) => nil
(let ((case-fold-search t))
  (string-match "[A_]" "A")) => 0

In my opinion, specifying ranges by chars are nonsense
because there should be no semantics in the order of
characters codes.  But, anyway, we have to decide what to
do.

(1) Regard the above case as a bug, and fix it completely.
    As we don't support a range striding over different
    charsets by the current Emacs, I think the fix is
    difficult but not that much.  But, in emacs-unicode, we
    can't have such a restriction, and thus the fix is very
    difficult.

(2) Regard the above case as an (unpleasant) feature, and
    document it.

(3) Signal an error for such a regex (and of course document
    it).

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2002-09-06 20:03 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-23  6:25 regex and case-fold-search problem Kenichi Handa
2002-08-23 15:56 ` Eli Zaretskii
2002-08-24  0:51   ` Kenichi Handa
2002-08-24  1:03     ` Miles Bader
2002-08-24  9:42       ` Eli Zaretskii
2002-08-24 16:16       ` Andreas Schwab
2002-08-26  1:54         ` Miles Bader
2002-08-26 16:11           ` Stefan Monnier
2002-08-26 21:51         ` Richard Stallman
2002-08-24  9:39     ` Eli Zaretskii
2002-08-26  1:29       ` Kenichi Handa
2002-08-26  2:31         ` Miles Bader
2002-08-25 22:21     ` Kim F. Storm
2002-08-23 17:36 ` Stefan Monnier
2002-08-23 21:52   ` Stefan Monnier
2002-08-24  1:16   ` Kenichi Handa
2002-08-25 18:52     ` Stefan Monnier
2002-08-26  1:56       ` Kenichi Handa
2002-08-24 10:40   ` Kai Großjohann
2002-08-26 21:51 ` Richard Stallman
2002-08-29  8:53   ` Kenichi Handa
2002-08-29 12:33     ` Kim F. Storm
2002-08-29 13:38       ` Kenichi Handa
2002-08-29 15:00         ` Kim F. Storm
2002-08-29 16:00         ` Stefan Monnier
2002-08-30  1:11           ` Kenichi Handa
2002-08-30 19:19             ` Richard Stallman
2002-08-30 19:19     ` Richard Stallman
2002-08-30 20:08       ` Stefan Monnier
2002-09-01 13:15         ` Richard Stallman
2002-09-01 16:26           ` Stefan Monnier
2002-09-02 14:54             ` Richard Stallman
2002-09-02 16:58               ` Stefan Monnier
2002-09-04 14:13                 ` Richard Stallman
2002-09-04 16:04                   ` Stefan Monnier
2002-09-05 18:02                     ` Richard Stallman
2002-09-06  1:00                       ` re-search-forward seems to be broken Miles Bader
2002-09-06 20:03                         ` Richard Stallman
2002-08-31  6:14       ` regex and case-fold-search problem Eli Zaretskii
2002-09-01 13:14         ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).