From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character folding in the pretest Date: Mon, 08 Feb 2016 19:48:15 +0200 Message-ID: <8360xzqejk.fsf@gnu.org> References: <87mvrg2zid.fsf@wanadoo.es> <20160204.180523.769253593641901728.wl@gnu.org> <20160205.070103.162978216111829522.wl@gnu.org> <877fifs3fy.fsf@mbork.pl> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1454953725 16064 80.91.229.3 (8 Feb 2016 17:48:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 8 Feb 2016 17:48:45 +0000 (UTC) Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org To: Marcin Borkowski Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 08 18:48:44 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aSpvV-0003i2-Sm for ged-emacs-devel@m.gmane.org; Mon, 08 Feb 2016 18:48:42 +0100 Original-Received: from localhost ([::1]:47285 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSpvV-0003Le-6c for ged-emacs-devel@m.gmane.org; Mon, 08 Feb 2016 12:48:41 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54390) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSpvS-0003LM-5c for emacs-devel@gnu.org; Mon, 08 Feb 2016 12:48:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aSpvR-0001NG-BH for emacs-devel@gnu.org; Mon, 08 Feb 2016 12:48:38 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33516) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aSpvM-0001ML-Jr; Mon, 08 Feb 2016 12:48:32 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1514 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1aSpvL-0005KK-Qi; Mon, 08 Feb 2016 12:48:32 -0500 In-reply-to: <877fifs3fy.fsf@mbork.pl> (message from Marcin Borkowski on Mon, 08 Feb 2016 15:05:05 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:199540 Archived-At: > From: Marcin Borkowski > Date: Mon, 08 Feb 2016 15:05:05 +0100 > Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org > > Just as another datapoint in discussion: for me, searching for "l" and > finding "ł" seems a bit weird. (The opposite even more so.) Which is why neither one happens under character folding. > BTW, strangely enough, here isearching for "l" does /not/ find "ł", but > isearching for "a" (with character folding on) finds "ą". Whatever one > thinks about char folding, this is clearly a bug. It's not a bug, it's the feature working as designed: we only fold characters that have suitable decompositions in the Unicode Character Database. So: (get-char-code-property ?ą 'decomposition) => (97 808) but (get-char-code-property ?ł 'decomposition) => (322) IOW, ą is canonically equivalent to the 2-character sequence a ̨ (which is why searching for a finds that character), while ł has no canonical decomposition (nor any other decomposition). This means that the Unicode guys decided that ł should not be equivalent to any other sequence of characters, and therefore Emacs doesn't find it unless you search for it literally. If you want to know why ł doesn't have any decompositions, I suggest to ask on the Unicode mailing list, I'm sure they had good reasons, most probably reasons that came from people who are experts in the Polish language and its intricacies. We just trust the results.