From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: char equivalence classes in search - why not symmetric?
Date: Tue, 01 Sep 2015 19:16:32 +0300
Message-ID: <834mjecdy7.fsf@gnu.org>
References: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-Trace: ger.gmane.org 1441124241 19640 80.91.229.3 (1 Sep 2015 16:17:21 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 1 Sep 2015 16:17:21 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Drew Adams <drew.adams@oracle.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 01 18:17:13 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZWoFD-0004FV-Fe
	for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 18:17:11 +0200
Original-Received: from localhost ([::1]:55684 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZWoFD-0002Bn-Cq
	for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 12:17:11 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55969)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1ZWoEU-0001eg-LW
	for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:27 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1ZWoEQ-0003Tb-CI
	for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:26 -0400
Original-Received: from mtaout23.012.net.il ([80.179.55.175]:45038)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1ZWoEQ-0003T8-4e
	for emacs-devel@gnu.org; Tue, 01 Sep 2015 12:16:22 -0400
Original-Received: from conversion-daemon.a-mtaout23.012.net.il by
	a-mtaout23.012.net.il (HyperSendmail v2007.08) id
	<0NU000K00AEGMU00@a-mtaout23.012.net.il> for
	emacs-devel@gnu.org; Tue, 01 Sep 2015 19:16:20 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout23.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0NU000K6XAJ8GQB0@a-mtaout23.012.net.il>;
	Tue, 01 Sep 2015 19:16:20 +0300 (IDT)
In-reply-to: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.175
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:189394
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/189394>

> Date: Tue, 1 Sep 2015 08:46:26 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
>=20
> When character folding is turned on, shouldn't you be able to
> search for =E1 and find (match) a, =E0, =E3, =AA, =E2, =E5, and =
=E4?

No.  You should find only =E1.

> I think so.  Currently you cannot - you can only do the reverse:
> search for a and find any of the above.  a is treated specially.
> Why?

It's the same principle as with case-folding: if you type "FOO", you
will not find the lowercase variant.

> I suppose that the logic behind the current implementation is
> to mirror what we do with case-fold searching.  But is that the
> right thing in this case?

It's what the Unicode Standard recommends, and IMO it makes a lot of
sense.  See http://unicode.org/reports/tr10/#Searching.

> To me, folding a group of chars together for search purposes
> should be symmetric - go both ways.

You will see that the above Unicode report explicitly recommends to
make it _asymmetric_.

> Why not?  Why, when char folding, treat plain a specially for
> searching?  Why not treat =E1, a, =E0, =E3, =AA, =E2, =E5, and =
=E4 the same?
> Isn't that the point here?  We are telling Isearch that they
> are equivalent.  Why pick one of them as the canonical
> search-pattern to use for finding any of them?  Why privilege
> a over =E1, a, =E0, =E3, =AA, =E2, =E5, and =E4?

Because we are not "telling Isearch that they are equivalent".  We ar=
e
asking for matches that disregard the diacriticals (and in case of =
=AA
also higher-order collation-order variation).

> Now most of the time I, like most people, will by typing a
> instead of =E1 into a search string.  But that's not really the
> point.  I think users should be able to use any members of an
> equivalence class of chars indifferently.

That'd make searching for exactly =E1 unnecessarily complicated and/o=
r
cumbersome, for no good reason.  The symmetry you suggest has no
practical advantages (because you can find all of these characters by
just specifying a), but does have significant practical disadvantages=
.

> This feature, welcome as it is, seems only half-baked, so far.

No need for derogatory language, thank you.  We certainly have a lot
to learn about this feature, but half-baked it isn't.