From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Elias_M=C3=A5rtenson?= Newsgroups: gmane.emacs.devel Subject: Re: On language-dependent defaults for character-folding Date: Sun, 21 Feb 2016 14:28:40 +0800 Message-ID: References: <83pow26svf.fsf@gnu.org> <87a8n5srbp.fsf@wanadoo.es> <83d1s17npz.fsf@gnu.org> <87oablfpn3.fsf@mail.linkov.net> <834mdd6llx.fsf@gnu.org> <7fbb8bc7-9a97-4bad-a103-a6690a35241d@default> <834mdc5w6o.fsf@gnu.org> <838u2hu6aq.fsf@gnu.org> <871t899tde.fsf@gnus.org> <83y4ahru04.fsf@gnu.org> <83fuwproyf.fsf@gnu.org> <837fi0sz29.fsf@gnu.org> <83egc8qzjh.fsf@gnu.org> <87egc7evu3.fsf@gnus.org> <83io1jpt4u.fsf@gnu.org> <87povqhj25.fsf@gnus.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1143abf6ba6eea052c41d329 X-Trace: ger.gmane.org 1456036145 17186 80.91.229.3 (21 Feb 2016 06:29:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 21 Feb 2016 06:29:05 +0000 (UTC) Cc: Eli Zaretskii , emacs-devel To: Lars Ingebrigtsen Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 21 07:29:04 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aXNVu-0008Ei-Sd for ged-emacs-devel@m.gmane.org; Sun, 21 Feb 2016 07:29:03 +0100 Original-Received: from localhost ([::1]:38531 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXNVt-00019T-Ul for ged-emacs-devel@m.gmane.org; Sun, 21 Feb 2016 01:29:01 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54435) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXNVd-000193-7I for emacs-devel@gnu.org; Sun, 21 Feb 2016 01:28:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aXNVZ-00045c-7C for emacs-devel@gnu.org; Sun, 21 Feb 2016 01:28:45 -0500 Original-Received: from mail-vk0-x230.google.com ([2607:f8b0:400c:c05::230]:32819) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXNVY-00045M-Vk; Sun, 21 Feb 2016 01:28:41 -0500 Original-Received: by mail-vk0-x230.google.com with SMTP id k196so107029896vka.0; Sat, 20 Feb 2016 22:28:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7CQVoqNZTLDep6E1Lt4oCiYNBwFjYdcxonYBiauVYHI=; b=wdNcZvh+wxItPaEiFJB6hDGm1BeSvOY8l4/c++SsmZXyt4StNvYukeFJr6EXGzC+Yh +aj/onDGRdkPOmhnE2MYHFucnOpkpQU0qJJ0o3oG5ZYOZALRAP5nKisqL1fUrwnw1VAj YdwpeQFcB6KljpouiLl5HvaGl/WB7vIrBaL7E05GPHUuvp+znxA8xCtovoRyJZ5eA+fj B35AA4UX+LNQj7ZdWed1FCmbsQTEJHPtSta1qgc/BfqyzabLhmZx8xCcXDWI0W76fBon Pya0R6bw/5etCveZCsHWnhRv+y9BIF5+tgErNPlKll+f/w/1L04EjWXBfh9MBNo5gsIe PEJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7CQVoqNZTLDep6E1Lt4oCiYNBwFjYdcxonYBiauVYHI=; b=TrTWri7Sj1QAfbvEW7W2hxT673tKqt8uJCUj1a0nNAnFGFFQhYffYpOL5ABxRv1rdw Cxq4YXCX4YHJodTh0J2YKtJsl8rk0t6FRWx2VwJdgPOB6k49BQDOta9CRp6C7cb3p9Ux q1fpVl0eHMqzEuFvlapG+a+V99OHarZuMTeVHJTDXvDqYLgoVfc87EPolkKZD/qRrx1k Btr/+PZybNgy0Md9lTYzo2Gez41AFA8JvF639diaeNel87DoASqWUgv1uY/B7jAFfu24 svHkbT4qO67aKvt6UtQyQAKoB40MJgHPRM0e4idF+aqbrPjHW811ioUgEZZVgvbXvKuL Y9YQ== X-Gm-Message-State: AG10YOTbPwltQpa9sE4tAhCx6f5TX/XcdgT0MXDOPmubGzKHSHpRcNEk/fr6yL64DNq20UR6I3tbBq6SCyRldg== X-Received: by 10.31.179.146 with SMTP id c140mr15448393vkf.50.1456036120257; Sat, 20 Feb 2016 22:28:40 -0800 (PST) Original-Received: by 10.176.3.146 with HTTP; Sat, 20 Feb 2016 22:28:40 -0800 (PST) In-Reply-To: <87povqhj25.fsf@gnus.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400c:c05::230 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:200362 Archived-At: --001a1143abf6ba6eea052c41d329 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 21 February 2016 at 10:51, Lars Ingebrigtsen wrote: If that database gives us all that, then I'm all for using that database > instead of creating our own, of course. But why doesn't C-s o find =C3= =B8, > and C-s l find =C5=82 then? Because under the Unicode decomposition rules, =C3=B8 is not decomposable. = I can't explain why that is the case (probably because there is no reason to have a combining /. After all, the only languages that use =C3=B8 are langu= ages that use it as a character of its own). On a related note, I would expect a search for =C3=B6 to match =C3=B8. As w= ould you, I guess? In the thread on the Unicode mailing list, the recommendation seems to be to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there is a locale, but the choice of locale can easily be customisable (with the default being the user's locale). Another poster on the same thread mentioned that the CLDR doesn't go all the way, but adding a set of exceptions on top of it shouldn't be hard. In any case, the result would be significantly better than what is implemented now. Regards, Elias --001a1143abf6ba6eea052c41d329 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On 2= 1 February 2016 at 10:51, Lars Ingebrigtsen <larsi@gnus.org> = wrote:

If that databas= e gives us all that, then I'm all for using that database
instead of creating our own, of course.=C2=A0 But why doesn't C-s o fin= d =C3=B8,
and C-s l find =C5=82 then?

Because under t= he Unicode decomposition rules, =C3=B8 is not decomposable. I can't exp= lain why that is the case (probably because there is no reason to have a co= mbining /. After all, the only languages that use =C3=B8 are languages that= use it as a character of its own).

On a related n= ote, I would expect a search for =C3=B6 to match =C3=B8. As would you, I gu= ess?

In the thread on the Unicode mailing list, th= e recommendation seems to be to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there is a l= ocale, but the choice of locale can easily be customisable (with the defaul= t being the user's locale).

Another poster on = the same thread mentioned that the CLDR doesn't go all the way, but add= ing a set of exceptions on top of it shouldn't be hard. In any case, th= e result would be significantly better than what is implemented now.
<= div>
Regards,
Elias

=C2=A0=
--001a1143abf6ba6eea052c41d329--