From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Richard Stallman Newsgroups: gmane.emacs.devel Subject: Re: regex and case-fold-search problem Date: Sun, 01 Sep 2002 09:14:28 -0400 Sender: emacs-devel-admin@gnu.org Message-ID: References: <200208230625.PAA23426@etlken.m17n.org> <200208262151.g7QLpfA12782@wijiji.santafe.edu> <200208290853.RAA03185@etlken.m17n.org> <1659-Sat31Aug2002091421+0300-eliz@is.elta.co.il> Reply-To: rms@gnu.org NNTP-Posting-Host: localhost.gmane.org X-Trace: main.gmane.org 1030886548 7668 127.0.0.1 (1 Sep 2002 13:22:28 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 1 Sep 2002 13:22:28 +0000 (UTC) Cc: emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17lUgL-0001zT-00 for ; Sun, 01 Sep 2002 15:22:25 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17lVE5-0006mv-00 for ; Sun, 01 Sep 2002 15:57:18 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17lUhl-0005v8-00; Sun, 01 Sep 2002 09:23:53 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17lUYj-0003gp-00 for emacs-devel@gnu.org; Sun, 01 Sep 2002 09:14:33 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17lUYf-0003f3-00 for emacs-devel@gnu.org; Sun, 01 Sep 2002 09:14:31 -0400 Original-Received: from fencepost.gnu.org ([199.232.76.164]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17lUYf-0003ew-00 for emacs-devel@gnu.org; Sun, 01 Sep 2002 09:14:29 -0400 Original-Received: from rms by fencepost.gnu.org with local (Exim 4.10) id 17lUYe-0006Gi-00; Sun, 01 Sep 2002 09:14:28 -0400 Original-To: eliz@is.elta.co.il In-Reply-To: <1659-Sat31Aug2002091421+0300-eliz@is.elta.co.il> Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:7256 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:7256 > What about for Latin-2 characters? Will those regexp ranges > change their meaning in emacs-unicode? Yes. Latin-2 characters have different order in Unicode than in 8859-2. Those characters which are common to Latin-2 and Latin-1 are in the same order, but those which aren't have different places. The same goes for all the other Latin-N characters where N != 1. This suggests that perhaps there is no need to be careful about case-folding of ranges outside of ASCII and Latin-1. We could have some code to map a range specified by a Lisp program into a range of internal character codepoints (in Unicode Emacs, the latter would be Unicode codepoints). We could make this code depend on some user variable that states the external ordering meant by the application. For example, Cyrillic users could tell Emacs that [A-Z] was intended to work as in KOI8-R or as in 8859-5. This is a coherent idea, but since it is a substantial amount of work, the question is whether it is better to do this or do nothing about those cases. I wonder how many programs use ranges of Latin-2 or KOI8-R and depend on case-folding to work precisely. Probably few or none.