From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: char equivalence classes in search - why not symmetric? Date: Tue, 1 Sep 2015 10:50:22 -0700 (PDT) Message-ID: <38061f42-eaf1-47c6-b74d-f676ac952b18@default> References: <2a7b9134-af2a-462d-af6c-d02bad60bbe8@default> <834mjecdy7.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1441130041 21633 80.91.229.3 (1 Sep 2015 17:54:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 1 Sep 2015 17:54:01 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 01 19:53:49 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZWpki-000252-Gu for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 19:53:48 +0200 Original-Received: from localhost ([::1]:56541 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWpki-0003GU-C0 for ged-emacs-devel@m.gmane.org; Tue, 01 Sep 2015 13:53:48 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53196) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWphY-0006yf-1G for emacs-devel@gnu.org; Tue, 01 Sep 2015 13:50:33 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZWphW-0000Y0-Or for emacs-devel@gnu.org; Tue, 01 Sep 2015 13:50:31 -0400 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:17448) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWphS-0000X1-0f; Tue, 01 Sep 2015 13:50:26 -0400 Original-Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t81HoOpO016185 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 1 Sep 2015 17:50:25 GMT Original-Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t81HoOth007917 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Tue, 1 Sep 2015 17:50:24 GMT Original-Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id t81HoNZe029164; Tue, 1 Sep 2015 17:50:24 GMT In-Reply-To: <834mjecdy7.fsf@gnu.org> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9 (901082) [OL 12.0.6691.5000 (x86)] X-Source-IP: userv0022.oracle.com [156.151.31.74] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 141.146.126.69 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:189407 Archived-At: > > When character folding is turned on, shouldn't you be able to > > search for =E1 and find (match) a, =E0, =E3, =AA, =E2, =E5, and =E4? >=20 > No. You should find only =E1. No reason? > > I think so. Currently you cannot - you can only do the > > reverse: search for a and find any of the above. a is treated=20 > > specially. Why? >=20 > It's the same principle as with case-folding: if you type "FOO", > you will not find the lowercase variant. You're just echoing what it does, not supporting the behavior with reasons. And I already mentioned what you say here. > > I suppose that the logic behind the current implementation is > > to mirror what we do with case-fold searching. But is that the > > right thing in this case? >=20 > It's what the Unicode Standard recommends, and IMO it makes a > lot of sense. See http://unicode.org/reports/tr10/#Searching. I don't see that, when reading that section. I do see that it explicitly calls out that behavior as an _option_: 8.2 Asymmetric Search Users often find asymmetric searching to be a useful option. That users can find this optionally useful, I have no doubt. And I wouldn't be against making it a user option in Emacs. But I do not see anything in the section you cited that says that this asymmetric behavior is required, or recommended. In any case, Emacs is not beholden to any particular standard, as RMS so often reminds us. The question is what is useful for Emacs users. If you think "it makes a lot of sense" then you should have no difficulty giving some of that sense. So far, none; just appeals to authority. > > To me, folding a group of chars together for search purposes > > should be symmetric - go both ways. >=20 > You will see that the above Unicode report explicitly recommends > to make it _asymmetric_. No, I do not see that. I see that the report points out that such an optional behavior can be useful for some users. And it specifically points out the case "When doing an asymmetric search", making clear that there is also the case when NOT doing an asymmetric search. Obviously, for the simpler case of a symmetric search there is no need for a section describing it - it is straightforward, whereas the asymmetric search case takes some explaining. Which is precisely what makes it more complex for users. Nowhere in that report do I see that asymmetric search is the only, or even the recommended, search behavior. It is explicitly pointed out as an optional behavior. But I read the section quickly, and you are the expert. Please point to where I am mistaken. > > Why not? Why, when char folding, treat plain a specially for > > searching? Why not treat =E1, a, =E0, =E3, =AA, =E2, =E5, and =E4 the = same? > > Isn't that the point here? We are telling Isearch that they > > are equivalent. Why pick one of them as the canonical > > search-pattern to use for finding any of them? Why privilege > > a over =E1, a, =E0, =E3, =AA, =E2, =E5, and =E4? >=20 > Because we are not "telling Isearch that they are equivalent". I think we should be. At least that should be one possibility. > We are asking for matches that disregard the diacriticals > (and in case of =AA also higher-order collation-order variation). No. You are asking for that only when you use a search pattern that does not use the diacriticals. When you search with =E1 in the pattern you are NOT asking for matches that disregard the diacriticals. And why not? So far, no reasons given. I would favor being able not just to toggle between folded and unfolded search but to cycle among folded-symmetric, folded-asymmetric, and unfolded. Why not? > > Now most of the time I, like most people, will by typing a > > instead of =E1 into a search string. But that's not really the > > point. I think users should be able to use any members of an > > equivalence class of chars indifferently. >=20 > That'd make searching for exactly =E1 unnecessarily complicated and/or > cumbersome, for no good reason. The symmetry you suggest has no > practical advantages (because you can find all of these characters by > just specifying a), but does have significant practical disadvantages. Assertions with no supporting reasons/examples. > > This feature, welcome as it is, seems only half-baked, so far. >=20 > No need for derogatory language, thank you. Where I work, "half-baked" is used often, and it means not entirely finished, whether that refers to dev, QA, doc, whatever. It is not used in a derogatory way. And I made very clear that I welcome this feature. If you feel that "half-baked" in the context of software development is derogatory then I apologize for using the term. Let me say it this way: This feature, welcome as it is, seems not entirely finished. Whether now or later, I would like to see it go further. > We certainly have a lot to learn about this feature, And to document. And hopefully to further develop in the future. > but half-baked it isn't. Certainly the doc is half-baked, if baked at all. And in terms of the longer term goal of facilitating users modifying the classes of chars that are treated equivalently, and of defining their own sets of such classes, we are not there yet. Saying this does not take away from the progress made so far. This is a very welcome feature.