From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) Date: Sat, 07 Feb 2015 07:59:40 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <878ug9x4g3.fsf@lifelogs.com> References: <87fvakvwbf.fsf@lifelogs.com> <83k2zvebvm.fsf@gnu.org> Reply-To: emacs-devel@gnu.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1423313992 2017 80.91.229.3 (7 Feb 2015 12:59:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 7 Feb 2015 12:59:52 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 07 13:59:51 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YK4zG-0007zQ-NL for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 13:59:50 +0100 Original-Received: from localhost ([::1]:52807 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK4zG-0006LW-2S for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 07:59:50 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK4zC-0006LF-Pi for emacs-devel@gnu.org; Sat, 07 Feb 2015 07:59:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YK4z9-0007mO-KK for emacs-devel@gnu.org; Sat, 07 Feb 2015 07:59:46 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:46013) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK4z9-0007mI-E9 for emacs-devel@gnu.org; Sat, 07 Feb 2015 07:59:43 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1YK4z6-0007uk-6P for emacs-devel@gnu.org; Sat, 07 Feb 2015 13:59:40 +0100 Original-Received: from c-98-229-61-72.hsd1.ma.comcast.net ([98.229.61.72]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 07 Feb 2015 13:59:40 +0100 Original-Received: from tzz by c-98-229-61-72.hsd1.ma.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 07 Feb 2015 13:59:40 +0100 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 37 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-98-229-61-72.hsd1.ma.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Copies-To: never User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux) Cancel-Lock: sha1:fsoEZYxtBzpmy2lek+QxXXr5Ifw= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182601 Archived-At: On Fri, 06 Feb 2015 09:29:33 +0200 Eli Zaretskii wrote: >> From: Ted Zlatanov >> Date: Thu, 05 Feb 2015 17:16:04 -0500 >> >> https://emacs.stackexchange.com/questions/7992/how-to-search-an-arabic-word-in-text-without-its-diacritics-accents >> suggested it would be useful if diacritics were ignored when searching >> for text in various situations. This is similar to `case-fold-search' >> but more generic. Here's what I suggested as the answer at the ELisp >> level: ... EZ> That doesn't do what we want, it's only a partial solution to that EZ> problem. E.g., it doesn't equate the initial, medial, and final EZ> variants of the letters used by Arabic and other Semitic scripts. EZ> Moreover, you cannot even search for "a" and find "á", AFAICS. Thanks for explaining. I am certainly not an expert in this area and don't even speak or write Arabic, but my solution did work for the given parameters so I thought it might be useful. EZ> The way to solve this correctly and generally was discussed here some EZ> time ago, so if there are people here for whom this is an itch to EZ> scratch, please let's do this as discussed there. We already have all EZ> the necessary information for that in Emacs databases. I am not one of those people. There's little I can contribute other than this suggestion and testing for Romance languages with accents. The general need seems to be for extending `case-fold-search', perhaps with a new variable like `fold-search' that's a set of symbols. But I'm sure you've already thought of that. The performance concerns are justified but IMHO a correct solution is easy to optimize later, so I wouldn't worry too much about it. Ted