From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: regex and case-fold-search problem Date: Mon, 26 Aug 2002 10:29:29 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200208260129.KAA27014@etlken.m17n.org> References: <200208230625.PAA23426@etlken.m17n.org> <9003-Fri23Aug2002185625+0300-eliz@is.elta.co.il> <200208240051.JAA24648@etlken.m17n.org> <9743-Sat24Aug2002123958+0300-eliz@is.elta.co.il> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1030325503 31298 127.0.0.1 (26 Aug 2002 01:31:43 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 26 Aug 2002 01:31:43 +0000 (UTC) Cc: emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17j8jG-00088h-00 for ; Mon, 26 Aug 2002 03:31:42 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17j9Dr-000431-00 for ; Mon, 26 Aug 2002 04:03:19 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17j8kX-0000Xa-00; Sun, 25 Aug 2002 21:33:01 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17j8hl-00088W-00 for emacs-devel@gnu.org; Sun, 25 Aug 2002 21:30:09 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17j8hE-00081K-00 for emacs-devel@gnu.org; Sun, 25 Aug 2002 21:30:08 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17j8hE-00081F-00 for emacs-devel@gnu.org; Sun, 25 Aug 2002 21:29:36 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g7Q1TTl15285; Mon, 26 Aug 2002 10:29:29 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g7Q1TT914822; Mon, 26 Aug 2002 10:29:29 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA27014; Mon, 26 Aug 2002 10:29:29 +0900 (JST) Original-To: eliz@is.elta.co.il In-Reply-To: <9743-Sat24Aug2002123958+0300-eliz@is.elta.co.il> User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:6891 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:6891 In article <9743-Sat24Aug2002123958+0300-eliz@is.elta.co.il>, "Eli Zaretskii" writes: >> > Does that happen because under case-fold-search non-nil the >> > characters on the range specification are downcased? >> >> Yes. > Then perhaps, instead of downcasing the range, we should do the > comparison in a case-insensitive manner? Or is that impossible with > the current regex code? Of course, it's not impossible. It's just not easy. >> I mean that the concept of character range itself is not >> good. > As Miles wrote, it does make a perfect sense in a context of a > specific language. For example, if the characters that designate the > range are all Cyrillic characters, the range is sensible. It makes sense only when we assume some character set (or locale). For instance, in Emacs 21, Cyrillic characters has the same code order as that of iso-8859-5. But, in emacs-unicode, we use Unicode. So, a Cyrillic char range that works well in Emacs 21 won't work in emacs-unicode. > It would IMHO be a pity to lose the ability to specify ranges in such > cases. I don't suggest to remove that ability. I'm just wondering if it is worth spending our time (and perhaps users time) to make Emacs behave completely correctly to handle a char range especially in the case that case-fold-search is t. I think something like Stefan's compromise method (quoted below) is good enough. > For ASCII it's pretty easy to fix. But for other charsets, it's > indeed more tricky. Maybe we can simply use the smallest contiguous > range of chars that includes all the chars we should match, > so the behavior is indeed "implementation-defined" (in the sense > that it's not necessarily obvious to the user what happens) but > it's at least less confusing (in the sense that (case-fold-search t) > matches at least as much as (case-fold-search nil)). --- Ken'ichi HANDA handa@etl.go.jp