From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: Character folding in the pretest Date: Thu, 4 Feb 2016 07:18:16 -0800 (PST) Message-ID: References: <87mvriuk3a.fsf@gmail.com> <8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org> <56B230D1.90902@gmail.com> <87bn7x4i4o.fsf@wanadoo.es> <87d1sc4rin.fsf@djcbsoftware.nl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1454599191 19757 80.91.229.3 (4 Feb 2016 15:19:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 4 Feb 2016 15:19:51 +0000 (UTC) To: "Dirk-Jan C. Binnema" , emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 04 16:19:39 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRLh4-0005uE-7M for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 16:19:38 +0100 Original-Received: from localhost ([::1]:42345 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRLh3-000576-KZ for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 10:19:37 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45399) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRLgv-00050x-Vj for emacs-devel@gnu.org; Thu, 04 Feb 2016 10:19:31 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRLgq-0006mH-SR for emacs-devel@gnu.org; Thu, 04 Feb 2016 10:19:29 -0500 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:36702) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRLgq-0006ls-Mj for emacs-devel@gnu.org; Thu, 04 Feb 2016 10:19:24 -0500 Original-Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u14FII7e029658 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 4 Feb 2016 15:18:18 GMT Original-Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u14FIIDX010157 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Thu, 4 Feb 2016 15:18:18 GMT Original-Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u14FIHBG024530; Thu, 4 Feb 2016 15:18:18 GMT In-Reply-To: <87d1sc4rin.fsf@djcbsoftware.nl> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9 (901082) [OL 12.0.6691.5000 (x86)] X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 141.146.126.69 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199298 Archived-At: > > It would make sense to have the default based on the session's locale, > > meaning that in a Swedish locale a, =C3=A4 and =C3=A5 would be differen= t and n and =C3=B1 > > be different, but under a Spanish locale, the opposite would be true. >=20 > Character equivalence is based on the language(s) of whatever is in your > buffer, which might be correlated with your locale, but not more than > that. >=20 > Regardless, for the purpose of searching, my personal preference would > be to make folding rather inclusive; I don't really care about the exact > rules languages have come up for what letters are considered "the same", > I just care for what I, as a user, would find the easiest to match. >=20 > So for instance, I'd like "angstrom" to match "=C3=85ngstr=C3=B6m" even t= hough in > Swedish, a/=C3=85 and o/=C3=B6 are not the same. Somewhat similar to how > languages' capitalization rules are ignored when searching > case-insensitively. A few false positives are not much of problem. >=20 > That would also get my vote as a reasonable default for case-folding in > searches. But I'll happily take any default, as long as there's a way to > get the above behavior, preferably without having to change my locale. Both of these posts (one saying that it should be possible to take locale into account, perhaps even for default behavior; the other adding that someone might have a personal preference) point to the existence of multiple use cases and users needing to be able to (easily) control the behavior. We can fine-tune defaulting at design time, to try to provide a reasonable behavior for most use cases/contexts, but users still need to be able to easily customize the sets of equivalence classes, and they should be able to have multiple sets of such sets, which they can activate in different contexts (e.g. modes). That is really where the design effort should be, at this point. We have a basic char-folding mechanism, but we do not yet provide an easy way for a user to customize the behavior, let alone to define/get the various behaviors that s?he might want in different contexts.