From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Marcin Borkowski Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Mon, 08 Feb 2016 15:05:05 +0100 Message-ID: <877fifs3fy.fsf@mbork.pl> References: <87mvrg2zid.fsf@wanadoo.es> <20160204.180523.769253593641901728.wl@gnu.org> <20160205.070103.162978216111829522.wl@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1454940363 11167 80.91.229.3 (8 Feb 2016 14:06:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 8 Feb 2016 14:06:03 +0000 (UTC) Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org To: Werner LEMBERG Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 08 15:05:55 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aSmRv-0005y6-2Z for ged-emacs-devel@m.gmane.org; Mon, 08 Feb 2016 15:05:55 +0100 Original-Received: from localhost ([::1]:45048 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSmRu-0001Yk-AV for ged-emacs-devel@m.gmane.org; Mon, 08 Feb 2016 09:05:54 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49223) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSmRX-0001XP-85 for emacs-devel@gnu.org; Mon, 08 Feb 2016 09:05:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aSmRR-00020T-8a for emacs-devel@gnu.org; Mon, 08 Feb 2016 09:05:31 -0500 Original-Received: from mail.mojserwer.eu ([2a01:5e00:2:52::8]:55394) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSmRR-00020N-0i for emacs-devel@gnu.org; Mon, 08 Feb 2016 09:05:25 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by mail.mojserwer.eu (Postfix) with ESMTP id A95879D2003; Mon, 8 Feb 2016 15:05:22 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mail.mojserwer.eu Original-Received: from mail.mojserwer.eu ([127.0.0.1]) by localhost (mail.mojserwer.eu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QINtXkI20XZ3; Mon, 8 Feb 2016 15:05:12 +0100 (CET) Original-Received: from localhost (unknown [109.232.24.28]) by mail.mojserwer.eu (Postfix) with ESMTPSA id 194809D2001; Mon, 8 Feb 2016 15:05:10 +0100 (CET) User-agent: mu4e 0.9.13; emacs 25.1.50.1 In-reply-to: <20160205.070103.162978216111829522.wl@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2a01:5e00:2:52::8 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199511 Archived-At: On 2016-02-05, at 07:01, Werner LEMBERG wrote: >> How do you even define "optical similarities"? > > Basically the same as Eli has described: Base character plus > diacritics, probably plus some basic shapes with `diacritics' that > Unicode doesn't represent as composable: o =E2=86=92 =C3=B8, l =E2=86=92= =C5=82, d =E2=86=92 =C4=91, etc. Just as another datapoint in discussion: for me, searching for "l" and finding "=C5=82" seems a bit weird. (The opposite even more so.) I admi= t this might be nice for people without access to Polish keyboard, and in fact the most popular layout for Polish keyboard is one where "AltGr + l" stands for "=C5=82", but they are really different letters, and simila= rly with other such cases: "=C5=82ata" =3D "patch" "lata" =3D "flies" (verb, as in "something flies") "k=C4=85t" =3D "angle" "kat" =3D "hangman" Etc., etc. BTW, strangely enough, here isearching for "l" does /not/ find "=C5=82", = but isearching for "a" (with character folding on) finds "=C4=85". Whatever = one thinks about char folding, this is clearly a bug. For Polish texts, I would rather turn char folding off. Best, --=20 Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University