From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Per_Starb=C3=A4ck?= Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Sat, 6 Feb 2016 10:37:06 +0100 Message-ID: References: <87mvriuk3a.fsf@gmail.com> <8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org> <56B230D1.90902@gmail.com> <87bn7x4i4o.fsf@wanadoo.es> <87d1sc4rin.fsf@djcbsoftware.nl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1454752076 20972 80.91.229.3 (6 Feb 2016 09:47:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Feb 2016 09:47:56 +0000 (UTC) Cc: "Dirk-Jan C. Binnema" , Drew Adams , emacs-devel To: Artur Malabarba Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 06 10:47:55 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRzT8-0002Cs-Uk for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 10:47:55 +0100 Original-Received: from localhost ([::1]:52736 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRzT7-0002V9-Um for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 04:47:53 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39699) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRzIi-0007rs-U6 for emacs-devel@gnu.org; Sat, 06 Feb 2016 04:37:10 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRzIh-00036K-EZ for emacs-devel@gnu.org; Sat, 06 Feb 2016 04:37:08 -0500 Original-Received: from mail-vk0-x231.google.com ([2607:f8b0:400c:c05::231]:36375) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRzIh-00036G-8O for emacs-devel@gnu.org; Sat, 06 Feb 2016 04:37:07 -0500 Original-Received: by mail-vk0-x231.google.com with SMTP id c3so24103677vkb.3 for ; Sat, 06 Feb 2016 01:37:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Jk3U1VznByFTEVfG8GaYmVV+bn7G63Yepv2kZvkrph8=; b=hxe5cn76rlHs72qnRFvVUgCI+OPFLAw/0jJNOVN5VUB1OmuZduf5YqwoLCKzlpZ2b+ 3FWXXHFq/BJ04oTTA2nSLZgKhpVTXLE1Sgod7umAWiDgxrr+NMDjTetT56imgukzHOcJ VBlbLAjDaV+RxVDSr62zfNFVHyJu30e9LEou3gZuLoZTyYN5oFrPf40CDRmHy8ifyTSf 5PVvcbaUPOcUDYt65RrpICW7/SO4U2vENGJadIbRkdbAtXoGymQVJfX4aDISkG1ZhmHp hkWC+oxaptS3cV4JKwK9JOcEjQ6k4P9OzFvEokMl2wx3S8M0GsZcs86CicUeFkkDOmhw +RrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Jk3U1VznByFTEVfG8GaYmVV+bn7G63Yepv2kZvkrph8=; b=iZCPdfOPKwbyiKf5x7ddPuqZrMVEMLfA0DLpLjg82V6Af4uVoYw05/litB1+GAP6sI C9mWhrrLqOoS2JJ9t2xu50lM5vvB45gZFU1EVYxVm9PoSk78Cbtdo3j7A/QoGbPSa2h7 GBkYWhLQFNn2mEW7GQoOpsjAU/w0j4hYxJOEURo4RkiUJzsvPAtCwJHBC9dMZ5x2i994 Cnt211TAJFZQskGETXBKvcLSGROFKnVAU2cqqDc6uS9Wo3oD/LurV/Hwkk6l1vhlfN92 evqOZSI32zJtTS/S+4tWx1vVmoq+TEaFRP3TtVe7XH4zImU5GG0Itxkxoij2sCIIM1SI V2wA== X-Gm-Message-State: AG10YORJ4VDg+3GabsQrtqjzudoKDfsfNkGdKItxyFwoZwhPStIio38dNnfo2axPY9bUAjgDYfeBiv7HBq646Q== X-Received: by 10.31.13.79 with SMTP id 76mr12384377vkn.148.1454751426625; Sat, 06 Feb 2016 01:37:06 -0800 (PST) Original-Received: by 10.31.15.197 with HTTP; Sat, 6 Feb 2016 01:37:06 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400c:c05::231 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199404 Archived-At: Oscar Fuentes wrote: > If a Spaniard inputs "sana" on a search box and "sa=C3=B1a" is found, he > will regard the software as either buggy, dumb or completely > oblivious to Spanish culture. Similar to my example of how a Swede would see a search for "varpa" finding "v=C3=A4rpa" or "varp=C3=A5" (all of the three being existing total= ly different words). When met with the "argument" that not many people speak Swedish anyway I replied that it was only an example of what I knew best, and that there probably were similar examples in several other languages. I'm glad to hear there is one in Spanish, one of the largest languages of the world. Now let's count the number of affected people again! :) That character folding is dependent on locale is of course well-known by those who work on this. Artur Malabarba wrote: > FTR, like I've said a couple of times already, I will invest more time > into making this customizable once I've seen how it's received. > Also (and this I haven't said yet) I do plan on providing a better > default depending on locale. When the time comes to actually implement > it I'll explain why I prefer locale (over some notion of buffer-local > language). When Artur again confirmed that he is fine with having the new feature turned of in Emacs 25 with the intention of having it turned on later, after it has had enough testing, I though this would finally be settled. But evidently not yet... From the opposers it has been argued as if this is something mandated by Unicode, so we can do nothing about it but to follow. It doesn't matter if the result is seen as buggy or dumb by users. "This feature is simply folding as specified by the Unicode standard". That is not so. Of course the Unicode Consortium is well aware of the issues that I, Oscar and others are pointing out, and that I'm sure Artur is well aware of. Eli Zaretskii: > Perhaps you aren't familiar with Unicode equivalence, in which case I > suggest these sources: > > http://unicode.org/reports/tr10/#Searching > http://www.unicode.org/notes/tn5/ > http://www.unicode.org/reports/tr30/tr30-4.html But of course these take up issues like we have mentioned here. The first one mentions the aa/=C3=A5 equivalence in Danish for example. And to quote the last one: # In the general case, different search term foldings are applied for # different languages. For example, accent distinctions are ignorable # for some languages, but not for others. In English the accent in # words like na=C3=AFve is optional, while to a Swedish user 'o' and '=C3= =B6' # are distinct letters. That is by the way the last draft of a withdrawn tecnical report. Draft UTR #30: Unicode Character Foldings has been withdrawn. It was never formally approved; the last public version was a draft UTR,which can be found at http://www.unicode.org/reports/tr30/tr30-4.html. That shows not only that the issues I, Oscar and others are mentioning are not something new that we just thought of that Unicode somehow should have us ignore. It also shows that there *is* no technical report on Unicode Character Foldings. We have to break out of the circles this is going in. John Wiegley wrote: > A locale-based quotient for natural language text seems like a reasonable > default, unless pretesting/polling shows us otherwise. However, there wil= l > always be times when you don't want it, or you want a different quotient > altogether, or even various combinations of them. Yes, that would be a good default, but that's not a default that we can have in the next Emacs, but that there is great prospects we can have in the one after that. Please John, put your foot down and don't let this continue ad infinitum. The options we have are instead: (1) Let the default be as searching has worked before. Nothing gets worse for anyone. We'll the start of a new exciting feature available, that will be just right for many users, and that will be tried by a lot others as well, giving feedback for the continued development that Artur has written that he already is planning. (2) Make the fundamental feature searching work fundamentally different out of the box in a way that for many users will be seen as neat, and for many users will be seen as "buggy, dumb or completely oblivious to" the user's culture.