From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Elias_M=C3=A5rtenson?= Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Sat, 20 Feb 2016 18:08:20 +0800 Message-ID: References: <87io1xwq1e.fsf@wanadoo.es> <87vb5wvzfz.fsf@mail.linkov.net> <87io1wt4cc.fsf@wanadoo.es> <8737syoima.fsf@mail.linkov.net> <871t8iu277.fsf@wanadoo.es> <83d1s28kvh.fsf@gnu.org> <87r3gis7sm.fsf@wanadoo.es> <83twle71xy.fsf@gnu.org> <87io1us0te.fsf@wanadoo.es> <83pow26svf.fsf@gnu.org> <87a8n5srbp.fsf@wanadoo.es> <83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net> <834mdd6llx.fsf@gnu.org> <7fbb8bc7-9a97-4bad-a103-a6690a35241d@default> <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <83povrpwj6.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a114386ac81fce9052c30c705 X-Trace: ger.gmane.org 1455962920 20914 80.91.229.3 (20 Feb 2016 10:08:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Feb 2016 10:08:40 +0000 (UTC) Cc: Lars Ingebrigtsen , emacs-devel To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 20 11:08:40 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aX4St-0007LL-52 for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 11:08:39 +0100 Original-Received: from localhost ([::1]:59728 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX4Ss-0000D9-G7 for ged-emacs-devel@m.gmane.org; Sat, 20 Feb 2016 05:08:38 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58592) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX4Sd-0000D0-1V for emacs-devel@gnu.org; Sat, 20 Feb 2016 05:08:24 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aX4Sb-0003JD-H5 for emacs-devel@gnu.org; Sat, 20 Feb 2016 05:08:22 -0500 Original-Received: from mail-vk0-x22f.google.com ([2607:f8b0:400c:c05::22f]:35934) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aX4Sb-0003J4-7C; Sat, 20 Feb 2016 05:08:21 -0500 Original-Received: by mail-vk0-x22f.google.com with SMTP id c3so94825501vkb.3; Sat, 20 Feb 2016 02:08:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qmblGAXjX6VgoK59LBR9qPgaNw4mF5/JY0BnxUwB2tk=; b=sDIiy/5oiw6smF2ws9ME3cRWKuK8QKdAbrMClqfCQcff82/N7+gOKG7HXbdxcd/A8h 3UvYjrNkC+LcGxesEgkkw5EeK1kMzkcY+skqzbNH9+okXR8PRpOf/JiGGiocOsYuhsdh o42aOpkPNf9YfFtKSA3dTUHXKXHgomy+/izw2Q0rGZMG0lSf5jv7qkTHyZ/vxYYt8d8l Y2/oA3277EuwGoJE9ZL28lKXa+FygmrP7eqE+2b9aK3huN+ZsTrgtdmdbbNJudYDcaZG 5LbeH83mnANuwWIbNQlhzHur0oxPa7vTQ6b5N1ZK5hTVSIgKueRCv9xmHw+Vmx/tIwKS n7ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=qmblGAXjX6VgoK59LBR9qPgaNw4mF5/JY0BnxUwB2tk=; b=exthq0ftOuRIy/wfeOD8uzvcsH5TDdCdh1YFi0z9gqEKlqDb2Dtu0hfLlkR881T+oS 1ds6CIESTkgMT+MxfgnN12kU3m0tveYtsEMtJ+xvQpBs0PbofuC8R62g5XSpxWMcOj0M CXc2SS7lktO0Dg9XjU4xgTwvkGpBL4SSCzfCpn+PBQsXH6P1bQ4hO6ubKqaOvVsayDa3 JBDywaugmLFuTmpvO01xM+rjR/qm3dBkwCtl35Jep4s4gwx9xdbKrgWSEI+6t5NutDPB GRlle6hmd922LvKFCKxtimcJLSWAO8RdXMOjoZ1ynx5vaWCpEMkizwQ1TsPz7bxv2ft8 fO/A== X-Gm-Message-State: AG10YOSaGm5UUicMpe/PsiSdG2v6da1hP+Y2tXQfztXqu3SZiefseoq1QF1VW+McBJWXDurdo9xXGFbdWFx/7w== X-Received: by 10.31.182.143 with SMTP id g137mr14764400vkf.45.1455962900787; Sat, 20 Feb 2016 02:08:20 -0800 (PST) Original-Received: by 10.176.3.146 with HTTP; Sat, 20 Feb 2016 02:08:20 -0800 (PST) In-Reply-To: <83povrpwj6.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400c:c05::22f X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200293 Archived-At: --001a114386ac81fce9052c30c705 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 20 February 2016 at 17:21, Eli Zaretskii wrote: Your interpretation is wrong, because every implementation of > character-folding in search uses normalization forms. So if you want > to maintain that whoever does that is abusing normalization forms, you > are not just up against Emacs, you are up against the ICU library and > others. You are also up against http://www.unicode.org/notes/tn5/. > They may do so, but only because we're not exactly swimming in great alternatives. > It is possible that you only see the "equivalence" parts of all these > sources. But in that case, you are actually claiming that folding > characters should never be done at all! "Folding" means mapping > _distinct_ character sequences to the same basic sequence. You start > from a normalization form, then compare the results disregarding > certain secondary, tertiary, etc. differences. Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion. When you say "ignoring [...] differences", how do you determine those differences? > Again (I really apologise for repeating myself, I'm starting to sound > like a troll and that is truly not my intention), > > the purpose of normalisation forms are to ensure that the two variants > of =C3=B1 compare the same. It is not > > designed to provide a mechanism to allow n to compare equal to =C3=B1. > > Under character-folding that ignores diacritics, =C3=B1 should indeed > compare equal to n. > Yes again. But how do you determine what rules to apply? > > Sure, but doesn't it make sense to fall back to the user's default if > the buffer does not have an overriding > > locale? > > I don't know what you mean by "buffer has an overriding locale". > Emacs buffers don't have a locale, and they cannot do that in > principle because we support multiple languages. E.g., what could the > locale of the HELLO buffer created by "C-h H" be? > I was not talking about what Emacs does today. I was speaking about the hypothetical case where buffers can have unique locales. I can see a few cases where that would be a neat thing to have, but I have to scrape the barrel to do so. > > As opposed to having no concept of locale at all? > > Yes. A multilingual environment cannot have a locale in principle. > It will cease being multilingual if it does. > I guess we'll have to agree to disagree about this one. In any case, it's for a different thread. > > Strange, I always thought the data was there. Perhaps you should ask > > a question on the Unicode mailing list, then. > > > > That's a good idea actually. > > That's a relief. I was beginning to suspect I don't have any good > ideas at all. > Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and will try to be better going forward. Regards, Elias --001a114386ac81fce9052c30c705 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On 2= 0 February 2016 at 17:21, Eli Zaretskii <eliz@gnu.org> wrote:

Your interp= retation is wrong, because every implementation of
character-folding in search uses normalization forms.=C2=A0 So if you want<= br> to maintain that whoever does that is abusing normalization forms, you
are not just up against Emacs, you are up against the ICU library and
others.=C2=A0 You are also up against http://www.unicode.org/notes/= tn5/.

They may do so, but only beca= use we're not exactly swimming in great alternatives.
=C2=A0<= /div>
It is possible that you only see the &q= uot;equivalence" parts of all these
sources.=C2=A0 But in that case, you are actually claiming that folding
characters should never be done at all!=C2=A0 "Folding" means map= ping
_distinct_ character sequences to the same basic sequence.=C2=A0 You start<= br> from a normalization form, then compare the results disregarding
certain secondary, tertiary, etc. differences.

<= div>Of course. But the fact that you start from a normalisation form is of = secondary relevance here. I thinking that perhaps repeating the fact that t= he normalised form is used has somewhat clouded the discussion.
<= br>
When you say "ignoring [...] differences", how do y= ou determine those differences?

> Again (I really apologise for repeating mysel= f, I'm starting to sound like a troll and that is truly not my intentio= n),
> the purpose of normalisation forms are to ensure that the two variants= of =C3=B1 compare the same. It is not
> designed to provide a mechanism to allow n to compare equal to =C3=B1.=

Under character-folding that ignores diacritics, =C3=B1 should indee= d
compare equal to n.

Yes again. But how = do you determine what rules to apply?
=C2=A0
> Sure, but doesn't it make sense t= o fall back to the user's default if the buffer does not have an overri= ding
> locale?

I don't know what you mean by "buffer has an overriding loc= ale".
Emacs buffers don't have a locale, and they cannot do that in
principle because we support multiple languages.=C2=A0 E.g., what could the=
locale of the HELLO buffer created by "C-h H" be?

I was not talking about what Emacs does today. I was s= peaking about the hypothetical case where buffers can have unique locales. = I can see a few cases where that would be a neat thing to have, but I have = to scrape the barrel to do so.
=C2=A0
> As opposed to having no concept of locale at all?

Yes.=C2=A0 A multilingual environment cannot have a locale in princi= ple.
It will cease being multilingual if it does.

I guess we'll have to agree to disagree about this one. In any ca= se, it's for a different thread.
=C2=A0
>=C2=A0 Strange, I always thought the da= ta was there. Perhaps you should ask
>=C2=A0 a question on the Unicode mailing list, then.
>
> That's a good idea actually.

That's a relief.=C2=A0 I was beginning to suspect I don't ha= ve any good
ideas at all.

Apparently I have g= iven the impression that I think your ideas are garbage. I profoundly apolo= gise for this and will try to be better going forward.

Regards,
Elias
--001a114386ac81fce9052c30c705--