From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: char equivalence classes in search - why not symmetric? Date: Tue, 01 Sep 2015 19:16:32 +0300 Message-ID: <834mjecdy7.fsf@gnu.org> References: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1441124241 19640 80.91.229.3 (1 Sep 2015 16:17:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 1 Sep 2015 16:17:21 +0000 (UTC) Cc: emacs-devel@gnu.org To: Drew Adams Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 01 18:17:13 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZWoFD-0004FV-Fe for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 18:17:11 +0200 Original-Received: from localhost ([::1]:55684 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWoFD-0002Bn-Cq for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 12:17:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55969) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWoEU-0001eg-LW for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:27 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZWoEQ-0003Tb-CI for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:26 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:45038) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWoEQ-0003T8-4e for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:22 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0NU000K00AEGMU00@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Tue, 01 Sep 2015 19:16:20 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NU000K6XAJ8GQB0@a-mtaout23.012.net.il>; Tue, 01 Sep 2015 19:16:20 +0300 (IDT) In-reply-to: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.175 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:189394 Archived-At: > Date: Tue, 1 Sep 2015 08:46:26 -0700 (PDT) > From: Drew Adams >=20 > When character folding is turned on, shouldn't you be able to > search for =E1 and find (match) a, =E0, =E3, =AA, =E2, =E5, and = =E4? No. You should find only =E1. > I think so. Currently you cannot - you can only do the reverse: > search for a and find any of the above. a is treated specially. > Why? It's the same principle as with case-folding: if you type "FOO", you will not find the lowercase variant. > I suppose that the logic behind the current implementation is > to mirror what we do with case-fold searching. But is that the > right thing in this case? It's what the Unicode Standard recommends, and IMO it makes a lot of sense. See http://unicode.org/reports/tr10/#Searching. > To me, folding a group of chars together for search purposes > should be symmetric - go both ways. You will see that the above Unicode report explicitly recommends to make it _asymmetric_. > Why not? Why, when char folding, treat plain a specially for > searching? Why not treat =E1, a, =E0, =E3, =AA, =E2, =E5, and = =E4 the same? > Isn't that the point here? We are telling Isearch that they > are equivalent. Why pick one of them as the canonical > search-pattern to use for finding any of them? Why privilege > a over =E1, a, =E0, =E3, =AA, =E2, =E5, and =E4? Because we are not "telling Isearch that they are equivalent". We ar= e asking for matches that disregard the diacriticals (and in case of = =AA also higher-order collation-order variation). > Now most of the time I, like most people, will by typing a > instead of =E1 into a search string. But that's not really the > point. I think users should be able to use any members of an > equivalence class of chars indifferently. That'd make searching for exactly =E1 unnecessarily complicated and/o= r cumbersome, for no good reason. The symmetry you suggest has no practical advantages (because you can find all of these characters by just specifying a), but does have significant practical disadvantages= . > This feature, welcome as it is, seems only half-baked, so far. No need for derogatory language, thank you. We certainly have a lot to learn about this feature, but half-baked it isn't.