From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Sat, 06 Feb 2016 12:41:10 +0200 Message-ID: <837fiit92x.fsf@gnu.org> References: <87mvriuk3a.fsf@gmail.com> <8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org> <56B230D1.90902@gmail.com> <87bn7x4i4o.fsf@wanadoo.es> <87d1sc4rin.fsf@djcbsoftware.nl> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1454755310 2067 80.91.229.3 (6 Feb 2016 10:41:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Feb 2016 10:41:50 +0000 (UTC) Cc: djcb@djcbsoftware.nl, drew.adams@oracle.com, bruce.connor.am@gmail.com, emacs-devel@gnu.org To: Per =?utf-8?Q?Starb=C3=A4ck?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 06 11:41:43 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aS0JC-00076s-DX for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 11:41:42 +0100 Original-Received: from localhost ([::1]:52895 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aS0JB-0000wF-NU for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 05:41:41 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50099) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aS0J7-0000vx-6F for emacs-devel@gnu.org; Sat, 06 Feb 2016 05:41:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aS0J3-0000K1-Vh for emacs-devel@gnu.org; Sat, 06 Feb 2016 05:41:37 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34386) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aS0J3-0000Jx-SV; Sat, 06 Feb 2016 05:41:33 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2728 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aS0J3-0000uV-3d; Sat, 06 Feb 2016 05:41:33 -0500 In-reply-to: (message from Per =?utf-8?Q?Starb=C3=A4ck?= on Sat, 6 Feb 2016 10:37:06 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199405 Archived-At: > Date: Sat, 6 Feb 2016 10:37:06 +0100 > From: Per Starbäck > Cc: "Dirk-Jan C. Binnema" , > Drew Adams , emacs-devel > > From the opposers it has been argued as if this is something > mandated by Unicode, so we can do nothing about it but to follow. No one said anything like that. The references to the Unicode Standard and its various data and TRs are to make the point that the feature as implemented is based on sound principles and not on some arbitrary criteria. No one said the feature is "mandated" in any way, shape or form. Whether the features should be turned on by default is a matter only we the Emacs community will decide. > It doesn't matter if the result is seen as buggy or dumb by > users. "This feature is simply folding as specified by the Unicode > standard". The Unicode Standard specifies _how_ to fold during search. It also includes recommendations _when_ to fold. It doesn't mandate anything, and even if it did, we don't need to heed to that. Your arguments in this part are a red herring. > That is not so. Of course the Unicode Consortium is well aware of the > issues that I, Oscar and others are pointing out, and that I'm sure > Artur is well aware of. We are all aware of that, please give us credit that we know something about the issues involved. It is you who seems to misunderstands important aspects of this, see below. > Eli Zaretskii: > > Perhaps you aren't familiar with Unicode equivalence, in which case I > > suggest these sources: > > > > http://unicode.org/reports/tr10/#Searching > > http://www.unicode.org/notes/tn5/ > > http://www.unicode.org/reports/tr30/tr30-4.html > > But of course these take up issues like we have mentioned here. The > first one mentions the aa/å equivalence in Danish for example. And to > quote the last one: > > # In the general case, different search term foldings are applied for > # different languages. For example, accent distinctions are ignorable > # for some languages, but not for others. In English the accent in > # words like naïve is optional, while to a Swedish user 'o' and 'ö' > # are distinct letters. It seems that you have read only the parts that confirm your views in your eyes, and skipped or dismissed the rest. And now you are spreading your misunderstanding among others. The facts are different. Unicode indeed recognizes that different languages change the rules to some degree. However, it defines several distinct degrees of conformance, and what we have now is the lowest possible level of conformance, the one that is not tailored to any particular language. See Section 3.8 of TR#10, referenced above, and Table 13 there. What we in fact implemented is the default collation weights, which are independent of language tailoring. This is similar to the data we use for case-folding: it doesn't include any language-specific tailoring, and so in some cases, like Turkish dotless i issue, produces results that are incorrect in the context of some specific languages. Still we use it, and it generally works very well. In the long run, we should add language-specific tailoring to this and other similar features. Currently, we lack the infrastructure for doing that in a useful way, so this further development must wait. But it doesn't mean the feature isn't useful as it is now, and several participants in this thread explicitly said they like what the feature gives them. Which doesn't surprise me, because it matches the advice in the Unicode Standard, so I know we are on the right path. > That is by the way the last draft of a withdrawn tecnical report. (So why are you quoting from it and claim that it supports your POV? If it's indeed a useless, withdrawn draft, then it has no relevance at all, right? Please decide whether you want to treat that report seriously or not, and please be consistent with your decision. Trying to have the cake and also eat it doesn't add credibility to your opinions.) > Draft UTR #30: Unicode Character Foldings has been withdrawn. It was > never formally approved; the last public version was a draft > UTR,which can be found at > http://www.unicode.org/reports/tr30/tr30-4.html. Actually, that draft was mentioned because it includes interesting and important stuff not mentioned in one place in any other publication I know of. I referred to it under an assumption that the reader will be keenly interested in learning as much relevant background information about the subject as possible, even if the report itself never made it to the official status. > We have to break out of the circles this is going in. There are no circles. We wanted to collect feedback, and we are collecting it. The pretest is going on for merely one week, and the feedback we have already is useful, and it keeps coming in. Stopping that and making the decision now makes no sense to me. The release is still quite far away, and we have nothing to lose by hearing from more people. Assuming we want to make an informed decision, there's no rush. > Please John, put your foot down and don't let this continue ad > infinitum. No one intends to continue "ad infinitum". That's another red herring. We should continue collecting feedback for a couple more of pretest releases, that's all. Then we can make the decision based on that feedback. I counted 10 people (excluding myself and Artur) who expressed their clear opinions in this thread; that is way too few for an intelligent decision, IMO. > The options we have are instead: > > (1) Let the default be as searching has worked before. Nothing gets > worse for anyone. > > We'll the start of a new exciting feature available, that will be just > right for many users, and that will be tried by a lot others as well, > giving feedback for the continued development that Artur has written > that he already is planning. > > (2) Make the fundamental feature searching work fundamentally > different out of the box in a way that for many users will be seen as > neat, and for many users will be seen as "buggy, dumb or completely > oblivious to" the user's culture. With all due respect, I don't think this is an objective description of the alternatives.