From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Fri, 05 Feb 2016 09:52:04 +0200 Message-ID: <83oabvvbkr.fsf@gnu.org> References: <87mvrg2zid.fsf@wanadoo.es> <20160204.180523.769253593641901728.wl@gnu.org> <20160205.070103.162978216111829522.wl@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1454658758 13457 80.91.229.3 (5 Feb 2016 07:52:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 5 Feb 2016 07:52:38 +0000 (UTC) Cc: ofv@wanadoo.es, emacs-devel@gnu.org To: Elias =?utf-8?Q?M=C3=A5rtenson?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 05 08:52:32 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRbBv-0002J7-VC for ged-emacs-devel@m.gmane.org; Fri, 05 Feb 2016 08:52:32 +0100 Original-Received: from localhost ([::1]:46593 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRbBv-000444-AR for ged-emacs-devel@m.gmane.org; Fri, 05 Feb 2016 02:52:31 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38564) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRbBg-00041h-68 for emacs-devel@gnu.org; Fri, 05 Feb 2016 02:52:17 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRbBf-0004QF-82 for emacs-devel@gnu.org; Fri, 05 Feb 2016 02:52:16 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:40795) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRbBZ-0004PR-Lb; Fri, 05 Feb 2016 02:52:09 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1235 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aRbBY-0002rW-F6; Fri, 05 Feb 2016 02:52:09 -0500 In-reply-to: (message from Elias =?utf-8?Q?M=C3=A5rtenson?= on Fri, 5 Feb 2016 14:36:13 +0800) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199365 Archived-At: > Date: Fri, 5 Feb 2016 14:36:13 +0800 > From: Elias Mårtenson > Cc: Óscar Fuentes , > emacs-devel > > What I did want to do was try try to explain that even though there is a visual similarity between A, Ä and Å, to > a Swedish speaker those similarities are no greater than those of q and k. And definitely much more different > than W and V (which were, up until recently sorted under V in dictionaries and seen as simply a visual > variation). > > > What you describe naturally leads to another user option: Don't handle > characters as `equal' (with a proper definition of `equal') that > aren't `equal' in the user's locale. > > This is exactly my point. And you have managed to compress hundreds of my words into a single, district > sentence. Thank you. We are not going by visual similarity, or any other arbitrary criteria. We are using established rules specified by the UCD, the Unicode Character Database, and the explanations that accompany it in the standard itself. The main rule is equivalent character strings should match (when character folding is enabled). That character equivalence is language-dependent is a truism that doesn't need to be argued. The plan is to have language-dependent variations as soon as Emacs acquires good infrastructure for doing that in a useful manner. The idea behind the current implementation was that this feature will be useful even when it is language-agnostic, which is the lowest level of compatibility cited in the Unicode Standard (so the Unicode Consortium guys didn't think it to be a stupid idea).