From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.devel Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) Date: Thu, 5 Feb 2015 23:06:26 +0000 Message-ID: References: <87fvakvwbf.fsf@lifelogs.com> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1423177594 20318 80.91.229.3 (5 Feb 2015 23:06:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 5 Feb 2015 23:06:34 +0000 (UTC) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 06 00:06:33 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YJVVI-0007aV-Vv for ged-emacs-devel@m.gmane.org; Fri, 06 Feb 2015 00:06:33 +0100 Original-Received: from localhost ([::1]:45937 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVVI-0000gD-3s for ged-emacs-devel@m.gmane.org; Thu, 05 Feb 2015 18:06:32 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51878) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVVE-0000fM-DP for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:06:29 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJVVD-00041D-Em for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:06:28 -0500 Original-Received: from mail-oi0-x22d.google.com ([2607:f8b0:4003:c06::22d]:51848) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVVD-000415-Ak for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:06:27 -0500 Original-Received: by mail-oi0-f45.google.com with SMTP id g201so9117473oib.4 for ; Thu, 05 Feb 2015 15:06:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=MZC/k/nMlwf02bj64bbxQk2trJ8PkNwAD4tyvNEguAs=; b=JsxMW0MhHKK9y0RMsiqaleE9kqLTQGEywj+5rBCwD5iJq7cvBluNwCXOHLu/ihWPBb u1c3iU7jglmycLTYtiS7gDtMTAaA66KA8buH4XfkA8Z9Jow5NnHp3vCfbZ7IxO3jCAhM iyzoQixVHUPo9vjfQ2l+zinSmvydnFKBF0rmoeMOGuHLO5TPuUzVXTqrfZbMjwi9nISw nXkI0Q+mai5gblGOZqtrXhXF8VHTL9EeqBXCu5iz9WI6YK80xqTfU7c4kuGkxj7Cz2GO 272beSPzqSTa8SZJ56AYDpC0UQ6Vw/CXtPs91xp7rtu+B49hWqhAOg/GP7G5xxn7c6jL lWoA== X-Received: by 10.182.231.230 with SMTP id tj6mr428154obc.58.1423177586811; Thu, 05 Feb 2015 15:06:26 -0800 (PST) Original-Received: by 10.76.125.1 with HTTP; Thu, 5 Feb 2015 15:06:26 -0800 (PST) In-Reply-To: <87fvakvwbf.fsf@lifelogs.com> X-Google-Sender-Auth: c9J2_xQuflnFrD9Lr0D9E9uJrAw X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4003:c06::22d X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182484 Archived-At: Something essentially identical to this was being discussed here a couple of weeks ago. Look for the thread "Single quotes in Info". I wrote a small elisp solution for building this into isearch (which you can find on the "scratch/isearch-character-group-folding" branch). It took a different approach to yours, relating characters to regexp, but it works. It's not merged because I was advised to looking into using the case-fold-search machinery. 2015-02-05 20:16 GMT-02:00 Ted Zlatanov : > https://emacs.stackexchange.com/questions/7992/how-to-search-an-arabic-word-in-text-without-its-diacritics-accents > suggested it would be useful if diacritics were ignored when searching > for text in various situations. This is similar to `case-fold-search' > but more generic. Here's what I suggested as the answer at the ELisp > level: > > #+begin_src emacs-lisp > (defun kill-marks (string) > (concat (loop for c across string > when (not (eq 'Mn (get-char-code-property c 'general-category))) > collect c))) > > (let* ((original1 "your Arabic string here") > (normalized1 (ucs-normalize-NFKD-string original1)) > (original2 "your other Arabic string here") > (normalized2 (ucs-normalize-NFKD-string original2))) > (equal > (replace-regexp-in-string "." 'kill-marks normalized1) > (replace-regexp-in-string "." 'kill-marks normalized2))) > #+end_src > > This would probably be useful for other languages, not just Arabic. But > implementing it for users so it works like `case-fold-search' (you just > set something in Customize and all search commands DWYM) seems much > harder. Does anyone have suggestions? Maybe some defadvice magic? Or is > it not possible? > > Thanks > Ted > >