From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: regex and case-fold-search problem Date: Mon, 26 Aug 2002 12:11:42 -0400 Sender: emacs-devel-admin@gnu.org Message-ID: <200208261611.g7QGBgt24993@rum.cs.yale.edu> References: <200208230625.PAA23426@etlken.m17n.org> <9003-Fri23Aug2002185625+0300-eliz@is.elta.co.il> <200208240051.JAA24648@etlken.m17n.org> <20020824010307.GA8549@gnu.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1030379018 12680 127.0.0.1 (26 Aug 2002 16:23:38 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 26 Aug 2002 16:23:38 +0000 (UTC) Cc: Andreas Schwab , Kenichi Handa , eliz@is.elta.co.il, emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17jMeO-0003IP-00 for ; Mon, 26 Aug 2002 18:23:36 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17jN9I-0007yB-00 for ; Mon, 26 Aug 2002 18:55:32 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17jMfh-0005bC-00; Mon, 26 Aug 2002 12:24:57 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17jMT8-00049J-00 for emacs-devel@gnu.org; Mon, 26 Aug 2002 12:11:58 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17jMT4-000491-00 for emacs-devel@gnu.org; Mon, 26 Aug 2002 12:11:56 -0400 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17jMT3-00048w-00 for emacs-devel@gnu.org; Mon, 26 Aug 2002 12:11:53 -0400 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id g7QGBgt24993; Mon, 26 Aug 2002 12:11:42 -0400 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Miles Bader Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:6917 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:6917 > Andreas Schwab writes: > > |> Yeah, but character ranges make perfect sense in many local contexts. > > |> E.g., [0-9], or [<0>-<9>] where <0> and <9> are `wide' digits from some > > |> character set. > > > > What does [A-Z] mean in EBCDIC? [0-9] is a special case, because ISO C > > requires that 0,1,2,3,4,5,6,7,8,9 are consecutive in the execution > > character set. But in many locales the collating sequence - > > contains more that just the upper case letters from the English alphabet. > > The question is not `does [A-Z] make sense?', but rather: `_if_ [A-Z] > makes sense, does [a-z] make sense too?' > > That is, we aren't the ones writing [A-Z], it's lisp authors or users > entering regexps or something. If they want to enter a less-than-useful > character range, that's their prerogative; however, emacs should avoid > making what they enter _less_ meaningful because of the case-fold-search > setting. > > My point was that perhaps in practice, the ranges that would get screwed > up by case-fold-search are even less sensible that normal, meaning it's > likely most people wouldn't (or shouldn't) use them, and we really don't > need to worry about the issue. [ASCII is probably a special case, since > it's so well known that people actually do tend to specify wierd ranges] > > [but it looks like maybe it will get fixed properly anyway...] I agree that we shouldn't spend too much time on it. The patch I installed does the following: - Fix a few problems such as ``if the case-table mapped ?* to ?o then "\\(fo\\)*" used to only match "foo"''. Luckily such case-tables are not very common, so nobody noticed the problem. - case-fold-search now works correctly for ranges in ASCII - case-fold-search still doesn't work correctly for ranges in non-ASCII but it matches at least as much as when case-fold-search is nil: i.e. the range might include some chars which the user didn't expect, but it at least include the chars which the user expected. The previous behavior was that the range could include some unexpected chars as well and could also not include some expected chars. The current code matches at least as many strings as the previous one. I think that's good enough for now, Stefan