From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Elias_M=C3=A5rtenson?= Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Fri, 5 Feb 2016 13:09:03 +0800 Message-ID: References: <87vb6431rd.fsf@wanadoo.es> <56B37DF4.7000808@gmail.com> <87mvrg2zid.fsf@wanadoo.es> <20160204.180523.769253593641901728.wl@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a11416af8bc6d23052afed906 X-Trace: ger.gmane.org 1454648972 1619 80.91.229.3 (5 Feb 2016 05:09:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 5 Feb 2016 05:09:32 +0000 (UTC) Cc: =?UTF-8?Q?=C3=93scar_Fuentes?= , emacs-devel To: Werner LEMBERG Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 05 06:09:32 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRYe9-0008BL-Fb for ged-emacs-devel@m.gmane.org; Fri, 05 Feb 2016 06:09:29 +0100 Original-Received: from localhost ([::1]:46032 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRYe8-0005ti-69 for ged-emacs-devel@m.gmane.org; Fri, 05 Feb 2016 00:09:28 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52150) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRYds-0005tL-CI for emacs-devel@gnu.org; Fri, 05 Feb 2016 00:09:13 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRYdr-0001NT-17 for emacs-devel@gnu.org; Fri, 05 Feb 2016 00:09:12 -0500 Original-Received: from mail-vk0-x234.google.com ([2607:f8b0:400c:c05::234]:35327) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRYdn-0001LK-5R; Fri, 05 Feb 2016 00:09:07 -0500 Original-Received: by mail-vk0-x234.google.com with SMTP id e6so50078191vkh.2; Thu, 04 Feb 2016 21:09:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xkx1aZ40K6rWw7pXU+r1ITgUttflKqDV+ldS+PcYkag=; b=VrSCIgqCoxfoP2BQf9Ns7NNlMZ34sSfcFOIOVGTfF5YVO8Hy4v+e/vZTqXCUskrz16 A4aV2pg2xX1sglkq6XDnW2E77LIPdgyFv0t1cw3q/9k4DJRoRZ4/qJ6PVgW1Q/6gz3pZ qBY2faujwyOBeSPwsMVT0FqNu0fG8405ZwoBwVk5iuXWR04TUz9f4SXxDRwhODJ4HR/V k1yPqSuMOfEsRI+TD2ZoQmN1DBwcyRqs8mQ0REoiwfKThKlDuYdFW6DK6zSRqyxKMZBF jMBRCYUgMVchgk/xUCnWQenEhv0Vp+e6RvmZPvbwcmRO7UzfjKBrLBXrtgaj20rxVQeY l1cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=xkx1aZ40K6rWw7pXU+r1ITgUttflKqDV+ldS+PcYkag=; b=R65nM86nKuBRP8X/ZpTWKZCAildZLkQFLXln2UiuDS1udvZXZ6PcCqYkBhLAqFNlbs EA/SGT9Ghxc7y5A2/HJqc/i9p25uee3XnVx4P7TYkQ54um+BahrGsi6MRBL4rkyrWCQ2 jjPB3ISt1g4529w/gQE3y7n0tatelyqvh08Q8pXXuV/xvUmXyBr0Haef0fUunc51rNVO IZuPGx+KZftNxzD20VzaGGdsWtd3iDLwGa3QA8C0yM5c7LfG0AeBX+ID5l9mH223xDHX L7PC1NmINz95YO9YOLe1GqF6MWqhTGCCQRbND79tda7sKb5MUzyZZBbE1KqLztWfUIvE fEUA== X-Gm-Message-State: AG10YOQ6StAzWW+kuoWm0kPUPrCJGXLxmLCJ8ZvQPcxtfHic0Cz+ahZwxBpNyGlvZbyuDkigJV3JAZP/Pj3D5g== X-Received: by 10.31.164.13 with SMTP id n13mr8019920vke.64.1454648946615; Thu, 04 Feb 2016 21:09:06 -0800 (PST) Original-Received: by 10.103.80.2 with HTTP; Thu, 4 Feb 2016 21:09:03 -0800 (PST) Original-Received: by 10.103.80.2 with HTTP; Thu, 4 Feb 2016 21:09:03 -0800 (PST) In-Reply-To: <20160204.180523.769253593641901728.wl@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400c:c05::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199351 Archived-At: --001a11416af8bc6d23052afed906 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 5 Feb 2016 1:06 a.m., "Werner LEMBERG" wrote: > > This naturally leads to a possible user option: Having `optical' > matches or not, where `optical' means `base character plus diacritic > and/or slight modifications', e.g., o =E2=86=92 =C3=B8 =E2=86=92 =C3=B6 e= tc., etc. I think this statement shows how easy it is to introduce cultural bias, although the fact that your name sounds German suggests that personal preference is involved. How do you even define "optical similarities"? Should l and I compare the same under this definition? They certainly looks similar. What about p and q? They look like mirror images of each other. What about z and s? They even sound similar. To a Swedish speaker there are zero similarities between a, =C3=A4 and =C3=A5. They are, in fact, just as different as a and= z are to an English speaker. I really cannot emphasise this enough, and reading this thread tells me that it needs to be emphasised even more. As someone who lives in an English speaking country and using English keyboards, while still working with documents in various languages, I see first-hand the need to have ways of searching for characters that I can't easily type on my keyboard, but this issue is orthogonal to that of character equivalence. The conflating of these two issues are, in my opinion, the root cause of many of the disagreements in this thread. My personal preference is that the expected behaviour of searches is more related to the locale of the user, rather than that of the document being searched. In other words, as a non-Spanish speaker, I'd expect to be able to find =C3=B1 when searching for n, even if the document I'm searching in = is in Spanish. There are definitely an infinite number of counter-examples to this (enough to keep this thread going for another 100 messages, I'm sure), but at least there is reason to consider making the default based on the locale of the user. --001a11416af8bc6d23052afed906 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On 5 Feb 2016 1:06 a.m., "Werner LEMBERG" <wl@gnu.org> wrote:
>
> This naturally leads to a possible user option: Having `optical' > matches or not, where `optical' means `base character plus diacrit= ic
> and/or slight modifications', e.g., o =E2=86=92 =C3=B8 =E2=86=92 = =C3=B6 etc., etc.

I think this statement shows how easy it is to introduce cul= tural bias, although the fact that your name sounds German suggests that pe= rsonal preference is involved.

How do you even define "optical similarities"? Sho= uld l and I compare the same under this definition? They certainly looks si= milar. What about p and q? They look like mirror images of each other. What= about z and s? They even sound similar. To a Swedish speaker there are zer= o similarities between a, =C3=A4 and =C3=A5. They are, in fact, just as dif= ferent as a and z are to an English speaker. I really cannot emphasise this= enough, and reading this thread tells me that it needs to be emphasised ev= en more.

As someone who lives in an English speaking country and usin= g English keyboards, while still working with documents in various language= s, I see first-hand the need to have ways of searching for characters that = I can't easily type on my keyboard, but this issue is orthogonal to tha= t of character equivalence. The conflating of these two issues are, in my o= pinion, the root cause of many of the disagreements in this thread.

My personal preference is that the expected behaviour of sea= rches is more related to the locale of the user, rather than that of the do= cument being searched. In other words, as a non-Spanish speaker, I'd ex= pect to be able to find =C3=B1 when searching for n, even if the document I= 'm searching in is in Spanish. There are definitely an infinite number = of counter-examples to this (enough to keep this thread going for another 1= 00 messages, I'm sure), but at least there is reason to consider making= the default based on the locale of the user.

--001a11416af8bc6d23052afed906--