From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Thu, 4 Feb 2016 12:27:49 -0500 Message-ID: <56B38A15.90703@gmail.com> References: <87mvriuk3a.fsf@gmail.com> <8737t9ex1p.fsf@petton.fr> <83oabxyf71.fsf@gnu.org> <56B230D1.90902@gmail.com> <87bn7x4i4o.fsf@wanadoo.es> <87d1sc4rin.fsf@djcbsoftware.nl> <87vb6431rd.fsf@wanadoo.es> <56B37DF4.7000808@gmail.com> <87mvrg2zid.fsf@wanadoo.es> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3GwXh6CTkDOjqv4uXCUs8IgcrrC3JudpD" X-Trace: ger.gmane.org 1454606907 29538 80.91.229.3 (4 Feb 2016 17:28:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 4 Feb 2016 17:28:27 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 04 18:28:17 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aRNhM-0001ny-Nn for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 18:28:04 +0100 Original-Received: from localhost ([::1]:43205 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRNhL-0007qo-TF for ged-emacs-devel@m.gmane.org; Thu, 04 Feb 2016 12:28:03 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56583) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRNhD-0007qR-P3 for emacs-devel@gnu.org; Thu, 04 Feb 2016 12:27:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRNhA-0007Dt-Ew for emacs-devel@gnu.org; Thu, 04 Feb 2016 12:27:55 -0500 Original-Received: from mout.kundenserver.de ([217.72.192.75]:61761) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRNhA-0007Dp-5H for emacs-devel@gnu.org; Thu, 04 Feb 2016 12:27:52 -0500 Original-Received: from [18.26.2.123] ([18.26.2.123]) by mrelayeu.kundenserver.de (mreue103) with ESMTPSA (Nemesis) id 0MLRLO-1aQpSm3TNK-000Ydi for ; Thu, 04 Feb 2016 18:27:51 +0100 X-Enigmail-Draft-Status: N1110 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 In-Reply-To: <87mvrg2zid.fsf@wanadoo.es> X-Provags-ID: V03:K0:/MHegTKriCz4XfL5Q1sZsQCWauVdiELkZIW/ZQwtVDvI30AVaRs tWaYeG/zHzotBqlrjMK5csvK6lBA68bJIBn0TgVQZZqQv4BmCXBL0ov25ORhKoEAlKeVPEi zeOXM58Xyve6URczfTqwWglsKs+X0Yd9UqQWOvHmJFOCbqbOP+NETUK7KAynTvxuy55kQJA jwfQSY3nj5CT/EmgvMl3w== X-UI-Out-Filterresults: notjunk:1;V01:K0:74oCIe2Nt7Y=:x8rpcLhGcSsl/L5Otx060Q brX2fv6L8X/6+hSrDj+ElmioNWaGo1K1Br7wv71LYg/cilciI6eER6cFjqvB3/Ild1ja4DNfI 3s7lEuBAwx+01W4FwZRhcm9pbWnBVZvzFMPFktypylBSVWwQmrs3Vr5IA8XiIEWdTD7ts/LBN XsBUjvG1M3fyJXJIeInGcyn5zAiPPXl4AYMk4ZkWG3SyQGLEYg+7kBP+K/m5HKoo4Fz7AVrly 1MgRYESszTvlSlkNnf/XdJWvBkRE7lC7C2rTuPqzQ+wd9gf4FRhzJkiFBrey9YH4way++jkvD c72o2Ne15xv3Mf77cMGcbskLdanFGPnsPKQvXM3yicdXZ9bRcXeCs5dOFG8z+sNpVQQGz/CrN d+/Qg91BTEofzN8lFjc77ej9zcQ0RWXDbxeO0TbEKfMrmJjC4nv7Fhe66pSxzmjxD1ftkQtR7 P4t5uUeLWoEddhRmtZj8oViBUUzHGQEjCQT+wlSL39bAjJrVpA/+MrwHz4TgmtOwgTFn0kFUP vvMpMaJVNTbdilxSTIA/+DWf3PlrZHUX98Yqjwdo1bduGTs1bd4Un4C8L+/yeA7/uIFMh6dou 6YsRfbhBQQ+Jk3USuKpABsVj9yGG61y+Kg+Ck4o8ObRroasRPcW63O1//Jetyfzt+kuHXAa45 QAKdqfRQ6DMzeUTuesqFIdkJIpB5EKn3eKBK5mvJw+zKF5PG7DB3ALif0F9pfXbqWbIY= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 217.72.192.75 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199312 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --3GwXh6CTkDOjqv4uXCUs8IgcrrC3JudpD Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 02/04/2016 11:47 AM, =C3=93scar Fuentes wrote: > Cl=C3=A9ment Pit--Claudel writes: >=20 >> On 02/04/2016 10:59 AM, =C3=93scar Fuentes wrote: >>> After seeing the case I mentioned (`n' matching `=C3=B1' in Spanish >>> text) it is obvious that the feature is not ready for prime=20 >>> time. >>=20 >> This is interesting. I guess it boils down to whether you're trying >> to avoid false positives or false negatives. For me the strength of >> this feature is that it lets me find virtually anything using an >> dumb keyboard (one without easy access to accents); I don't care >> too much about false positives (that is, I don't mind if =E2=80=98n=E2= =80=99 finds >> =E2=80=98=C3=B1=E2=80=99). In that sense, it doesn't matter if letters= "are different"; >> all that matters is whether they look different. I imagine that's >> why the Unicode standard defined things that way. It seems this >> behavior is consistent with that of most online search engines (I >> tried Google, Bing, and DuckDuckGo; all return accented matches for >> unaccented keywords). >=20 > I see your point, but you are talking about accents all the time. In=20 > Spanish `n' and `=C3=B1' are different letters. `n' matching `=C3=B1' i= s no=20 > different than `p' matching `q'. I think that you will agree that > some of us will see that behavior as a glaring bug. I should have said diacritics instead of accents; sorry. The difference b= etween n matching =C3=B1 and p matching q is that graphically, =C3=B1 is = n + ~ (it can also be encoded that way: =CC=83n).=20 Here's another issue that character folding solves; Id like your thoughts= on it. Try to search the text of my message for 'n' and '=C3=B1', withou= t any sort of character folding. This will match n but not =C3=B1: =CC=83n. This will match =C3=B1 but not n: =C3=B1. Note that the behaviour has nothing to do with Emacs; most applications w= ill behave the same. The first =C3=B1 is using n + combining tilde, while= the second is a single character =C3=B1. Both are legal representation o= f the Spanish letter =C3=B1. With character folding, both match 'n'. This= is a much more logical default, I think. The same thing can be said for = virtually every diacritic. On a more personal note, I wouldn't see the character folding behaviour a= s a bug for French, where =C3=A7 is quite different from c, and =C3=A9 is= quite different from e. >> I'm wary of smart solutions based on locale or buffer language. >> It's not uncommon to be writing a single document in multiple >> languages; especially if names are involved. Plus, it's not obvious >> that a single set of settings is enough for each locale. For >> example, one could argue that folding accents makes no sense in >> French: =E2=80=98supprim=C3=A9=E2=80=99 means =E2=80=98removed=E2=80=99= , but =E2=80=98supprime=E2=80=99 means =E2=80=98removes=E2=80=99. >> Yet it is not uncommon for people to write the latter for the >> former, especially when using a dumb keyboard. >=20 > I'm not sure how to fix this, but seeing similar reservations from > other users, some language-dependent behavior is unavoidable. I don't think so. An on-off switch seems enough to begin with. Language-d= ependent folding could to be a separate feature; unicode folding (the cur= retn implementation) would be a fine feature to start with, I think. --3GwXh6CTkDOjqv4uXCUs8IgcrrC3JudpD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWs4oVAAoJEPqg+cTm90wjcmwP+wRNnsRorEXfLqdjOFJ6x0YW jruXsqr7xPAVIKD6g6BgxGCvkRjCqFIvdM6cJnvBDzJw029RGlSrezUEu7pE4OAo u8b6wqJ54vzEOMVKbk/J3ZUgTRDQpCxLepUm/yjJiKF6C3aGy5GjoZqKtlYZcgAj W7NzojIzaH8+4imTfDYCcw44huIBVWCYBep5XH6pgnVVD0+933fCH84+zn44DLyC FnYLyzyOHeSeMmjQcLcy1rwooac79OMg+/vWh4/rJICvIzSqk5ITv2E6UnoXqwN1 jamzRtCiLsvQWpiMZGo60KLtWzfUYqai9lQHelpUm9KC5WkB/N7SQALi6/BICJ9t 5aBgFSHPtXoVERFUaypPEckyTgAs5sxa86JT3+S+463h0BJE2taEUME+Hm/XKIoe 64QrR5xWqn+hU4DLpimdJOYLzil8u7Q4FR7B1yQQPpgxDGWYIVJ60zJZnyQDWrud fpztdeWU7uBLrJV/BQjaAVKZOWLnc2ajWwaIJAj1iVSmrw776toH5eO7Y0q+xiA/ rJlTzQSbDY8N7kLP92AepTayWXPt5bLfLHNKk88wJEtdjVkYJbV6yGga8dOMM2Ls tLZX+0JBa4BsdAe+8QZKzt0OonoCJ92ywT7Ct8/jg0JHUUS3NQ9N0bQrDmGQPMVv MFXIXbV2dbVOlyhWbO4s =jplf -----END PGP SIGNATURE----- --3GwXh6CTkDOjqv4uXCUs8IgcrrC3JudpD--