From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) Date: Fri, 06 Feb 2015 13:58:00 +0900 Message-ID: <87h9uzprfr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87fvakvwbf.fsf@lifelogs.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1423198711 9096 80.91.229.3 (6 Feb 2015 04:58:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 6 Feb 2015 04:58:31 +0000 (UTC) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 06 05:58:31 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YJazu-0008R5-An for ged-emacs-devel@m.gmane.org; Fri, 06 Feb 2015 05:58:30 +0100 Original-Received: from localhost ([::1]:46688 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJazt-0005cm-3R for ged-emacs-devel@m.gmane.org; Thu, 05 Feb 2015 23:58:29 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33038) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJazc-0005ce-Uw for emacs-devel@gnu.org; Thu, 05 Feb 2015 23:58:13 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJazY-0005rO-RG for emacs-devel@gnu.org; Thu, 05 Feb 2015 23:58:12 -0500 Original-Received: from shako.sk.tsukuba.ac.jp ([130.158.97.161]:53397) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJazY-0005qA-Hc for emacs-devel@gnu.org; Thu, 05 Feb 2015 23:58:08 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by shako.sk.tsukuba.ac.jp (Postfix) with ESMTPS id 6ADBF1C38ED for ; Fri, 6 Feb 2015 13:58:00 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 4E4631A2CF1; Fri, 6 Feb 2015 13:58:00 +0900 (JST) In-Reply-To: X-Mailer: VM undefined under 21.5 (beta34) "kale" acf1c26e3019 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 130.158.97.161 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182492 Archived-At: Artur Malabarba writes: > The bright side is that I think this two-char way of writing latin > accents is much less common (not 100% sure though, it's hard to > tell the difference). Yes, it's less common if you take a random sample of the storage in the world, but there are specific places where the canonical NFD form is standardized, such as Apple's default file system (at least for Mac OS). I'm not sure how common that is (NFC is more friendly to casual hackers), but in any case there is a need to be able to deal with decomposed characters because not all composition sequences have precomposed forms. I would assume that Emacs's character handling machinery knows about this stuff, though, or at least the underlying libraries do. It's probably just a matter of incorporating an appropriate library call.