From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Character folding in the pretest
Date: Sat, 06 Feb 2016 12:41:10 +0200
Message-ID: <837fiit92x.fsf@gnu.org>
References: <CADkQgvv2+HhCLeXkBLphPye-fy=S9qJocqn9AH=wJC4Zb9k-pg@mail.gmail.com>
	<87mvriuk3a.fsf@gmail.com>
	<jwvegcuhqr7.fsf-monnier+gmane.emacs.devel@gnu.org>
	<8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org>
	<56B230D1.90902@gmail.com> <m2k2ml3ezj.fsf@newartisans.com>
	<87bn7x4i4o.fsf@wanadoo.es>
	<CADtN0WJY=rSCeA=RGbAPs=goQyYkX8cqw_LHrY+ZQjOz0kgxgw@mail.gmail.com>
	<87d1sc4rin.fsf@djcbsoftware.nl>
	<bd2b2b07-d4be-4443-9e31-84ba3615ed58@default>
	<CAAdUY-LqpOpn8CQeP0Wi77mxdQyXm_DOgF_aGhMw9QsfBsac6w@mail.gmail.com>
	<CADkQgvv-ZwfCqkhUeapEENLAxi984ndLqEporZ1J8bTxUMwRMg@mail.gmail.com>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1454755310 2067 80.91.229.3 (6 Feb 2016 10:41:50 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 6 Feb 2016 10:41:50 +0000 (UTC)
Cc: djcb@djcbsoftware.nl, drew.adams@oracle.com, bruce.connor.am@gmail.com,
	emacs-devel@gnu.org
To: Per =?utf-8?Q?Starb=C3=A4ck?= <per.starback@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 06 11:41:43 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aS0JC-00076s-DX
	for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 11:41:42 +0100
Original-Received: from localhost ([::1]:52895 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1aS0JB-0000wF-NU
	for ged-emacs-devel@m.gmane.org; Sat, 06 Feb 2016 05:41:41 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50099)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1aS0J7-0000vx-6F
	for emacs-devel@gnu.org; Sat, 06 Feb 2016 05:41:38 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1aS0J3-0000K1-Vh
	for emacs-devel@gnu.org; Sat, 06 Feb 2016 05:41:37 -0500
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34386)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1aS0J3-0000Jx-SV; Sat, 06 Feb 2016 05:41:33 -0500
Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2728
	helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1aS0J3-0000uV-3d; Sat, 06 Feb 2016 05:41:33 -0500
In-reply-to: <CADkQgvv-ZwfCqkhUeapEENLAxi984ndLqEporZ1J8bTxUMwRMg@mail.gmail.com>
	(message from Per =?utf-8?Q?Starb=C3=A4ck?= on Sat, 6 Feb 2016 10:37:06
	+0100)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:199405
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/199405>

> Date: Sat, 6 Feb 2016 10:37:06 +0100
> From: Per Starbäck <per.starback@gmail.com>
> Cc: "Dirk-Jan C. Binnema" <djcb@djcbsoftware.nl>,
> 	Drew Adams <drew.adams@oracle.com>, emacs-devel <emacs-devel@gnu.org>
> 
> From the opposers it has been argued as if this is something
> mandated by Unicode, so we can do nothing about it but to follow.

No one said anything like that.  The references to the Unicode
Standard and its various data and TRs are to make the point that the
feature as implemented is based on sound principles and not on some
arbitrary criteria.

No one said the feature is "mandated" in any way, shape or form.
Whether the features should be turned on by default is a matter only
we the Emacs community will decide.

> It doesn't matter if the result is seen as buggy or dumb by
> users. "This feature is simply folding as specified by the Unicode
> standard".

The Unicode Standard specifies _how_ to fold during search.  It also
includes recommendations _when_ to fold.  It doesn't mandate anything,
and even if it did, we don't need to heed to that.  Your arguments in
this part are a red herring.

> That is not so. Of course the Unicode Consortium is well aware of the
> issues that I, Oscar and others are pointing out, and that I'm sure
> Artur is well aware of.

We are all aware of that, please give us credit that we know something
about the issues involved.  It is you who seems to misunderstands
important aspects of this, see below.

> Eli Zaretskii:
> > Perhaps you aren't familiar with Unicode equivalence, in which case I
> > suggest these sources:
> >
> >   http://unicode.org/reports/tr10/#Searching
> >   http://www.unicode.org/notes/tn5/
> >   http://www.unicode.org/reports/tr30/tr30-4.html
> 
> But of course these take up issues like we have mentioned here. The
> first one mentions the aa/å equivalence in Danish for example. And to
> quote the last one:
> 
> #  In the general case, different search term foldings are applied for
> #  different languages. For example, accent distinctions are ignorable
> #  for some languages, but not for others. In English the accent in
> #  words like naïve is optional, while to a Swedish user 'o' and 'ö'
> #  are distinct letters.

It seems that you have read only the parts that confirm your views in
your eyes, and skipped or dismissed the rest.  And now you are
spreading your misunderstanding among others.

The facts are different.  Unicode indeed recognizes that different
languages change the rules to some degree.  However, it defines
several distinct degrees of conformance, and what we have now is the
lowest possible level of conformance, the one that is not tailored to
any particular language.  See Section 3.8 of TR#10, referenced above,
and Table 13 there.  What we in fact implemented is the default
collation weights, which are independent of language tailoring.

This is similar to the data we use for case-folding: it doesn't
include any language-specific tailoring, and so in some cases, like
Turkish dotless i issue, produces results that are incorrect in the
context of some specific languages.  Still we use it, and it generally
works very well.

In the long run, we should add language-specific tailoring to this and
other similar features.  Currently, we lack the infrastructure for
doing that in a useful way, so this further development must wait.
But it doesn't mean the feature isn't useful as it is now, and several
participants in this thread explicitly said they like what the feature
gives them.  Which doesn't surprise me, because it matches the advice
in the Unicode Standard, so I know we are on the right path.

> That is by the way the last draft of a withdrawn tecnical report.

(So why are you quoting from it and claim that it supports your POV?
If it's indeed a useless, withdrawn draft, then it has no relevance at
all, right?  Please decide whether you want to treat that report
seriously or not, and please be consistent with your decision.  Trying
to have the cake and also eat it doesn't add credibility to your
opinions.)

>   Draft UTR #30: Unicode Character Foldings has been withdrawn. It was
>   never formally approved; the last public version was a draft
>   UTR,which can be found at
>   http://www.unicode.org/reports/tr30/tr30-4.html.

Actually, that draft was mentioned because it includes interesting and
important stuff not mentioned in one place in any other publication I
know of.  I referred to it under an assumption that the reader will be
keenly interested in learning as much relevant background information
about the subject as possible, even if the report itself never made it
to the official status.

> We have to break out of the circles this is going in.

There are no circles.  We wanted to collect feedback, and we are
collecting it.  The pretest is going on for merely one week, and the
feedback we have already is useful, and it keeps coming in.  Stopping
that and making the decision now makes no sense to me.  The release is
still quite far away, and we have nothing to lose by hearing from more
people.  Assuming we want to make an informed decision, there's no
rush.

> Please John, put your foot down and don't let this continue ad
> infinitum.

No one intends to continue "ad infinitum".  That's another red
herring.  We should continue collecting feedback for a couple more of
pretest releases, that's all.  Then we can make the decision based on
that feedback.  I counted 10 people (excluding myself and Artur) who
expressed their clear opinions in this thread; that is way too few for
an intelligent decision, IMO.

> The options we have are instead:
> 
> (1) Let the default be as searching has worked before. Nothing gets
> worse for anyone.
> 
> We'll the start of a new exciting feature available, that will be just
> right for many users, and that will be tried by a lot others as well,
> giving feedback for the continued development that Artur has written
> that he already is planning.
> 
> (2) Make the fundamental feature searching work fundamentally
> different out of the box in a way that for many users will be seen as
> neat, and for many users will be seen as "buggy, dumb or completely
> oblivious to" the user's culture.

With all due respect, I don't think this is an objective description
of the alternatives.