From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Sat, 20 Feb 2016 12:44:01 +0200 Message-ID: <83h9h3pspa.fsf@gnu.org> References: <87io1xwq1e.fsf@wanadoo.es> <87vb5wvzfz.fsf@mail.linkov.net> <87io1wt4cc.fsf@wanadoo.es> <8737syoima.fsf@mail.linkov.net> <871t8iu277.fsf@wanadoo.es> <83d1s28kvh.fsf@gnu.org> <87r3gis7sm.fsf@wanadoo.es> <83twle71xy.fsf@gnu.org> <87io1us0te.fsf@wanadoo.es> <83pow26svf.fsf@gnu.org> <87a8n5srbp.fsf@wanadoo.es> <83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net> <834mdd6llx.fsf@gnu.org> <7fbb8bc7-9a97-4bad-a103-a6690a35241d@default> <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <83povrpwj6.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1455965074 19668 80.91.229.3 (20 Feb 2016 10:44:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Feb 2016 10:44:34 +0000 (UTC) Cc: larsi@gnus.org, emacs-devel@gnu.org To: Elias =?utf-8?Q?M=C3=A5rtenson?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 20 11:44:29 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aX51Y-0008Bo-TX for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 11:44:29 +0100 Original-Received: from localhost ([::1]:59929 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX51Y-0001Dq-8k for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 05:44:28 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39848) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX51R-00018e-H3 for emacs-devel@gnu.org; Sat, 20 Feb 2016 05:44:23 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aX51N-0004TR-7D for emacs-devel@gnu.org; Sat, 20 Feb 2016 05:44:21 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58811) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX51N-0004T8-3O; Sat, 20 Feb 2016 05:44:17 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1245 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aX51M-0007HF-CR; Sat, 20 Feb 2016 05:44:16 -0500 In-reply-to: (message from Elias =?utf-8?Q?M=C3=A5rtenson?= on Sat, 20 Feb 2016 18:08:20 +0800) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200296 Archived-At: > Date: Sat, 20 Feb 2016 18:08:20 +0800 > From: Elias Mårtenson > Cc: Lars Ingebrigtsen , emacs-devel > > It is possible that you only see the "equivalence" parts of all these > sources. But in that case, you are actually claiming that folding > characters should never be done at all! "Folding" means mapping > _distinct_ character sequences to the same basic sequence. You start > from a normalization form, then compare the results disregarding > certain secondary, tertiary, etc. differences. > > Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that > perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion. > > When you say "ignoring [...] differences", how do you determine those differences? > > > Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my > intention), > > the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is > not > > designed to provide a mechanism to allow n to compare equal to ñ. > > Under character-folding that ignores diacritics, ñ should indeed > compare equal to n. > > > Yes again. But how do you determine what rules to apply? Emacs currently ignores _any_ non-base differences, so ignoring is simple: we disregard any characters in the decomposition except the first one, which is the base character. Further improvements in this direction will need to access additional Unicode properties (to properly order the combining marks), and perhaps additional tables. But this is something to consider in the future, and it will have to be done in C anyway; the regexp based implementation cannot cut it. > > That's a good idea actually. > > That's a relief. I was beginning to suspect I don't have any good > ideas at all. > > Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and > will try to be better going forward. My smilies are usually implicit, so no sweat.