From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.devel Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) Date: Thu, 5 Feb 2015 23:17:42 +0000 Message-ID: References: <87fvakvwbf.fsf@lifelogs.com> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1423178277 31708 80.91.229.3 (5 Feb 2015 23:17:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 5 Feb 2015 23:17:57 +0000 (UTC) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 06 00:17:56 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YJVgK-0004l3-GA for ged-emacs-devel@m.gmane.org; Fri, 06 Feb 2015 00:17:56 +0100 Original-Received: from localhost ([::1]:46003 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVgJ-0005zj-RV for ged-emacs-devel@m.gmane.org; Thu, 05 Feb 2015 18:17:55 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55862) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVg7-0005za-R4 for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:17:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJVg6-0000H1-TG for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:17:43 -0500 Original-Received: from mail-ob0-x232.google.com ([2607:f8b0:4003:c01::232]:38197) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJVg6-0000Gp-OP for emacs-devel@gnu.org; Thu, 05 Feb 2015 18:17:42 -0500 Original-Received: by mail-ob0-f178.google.com with SMTP id uz6so10069930obc.9 for ; Thu, 05 Feb 2015 15:17:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=3NDuzQ8SQZwhbhvXz0yhW8fcH3MBVEIvm1vwsjkhCok=; b=AXTzuywxYaARAJz9Kfla3KIcOorHU/fwkbnID9QG3pgz0S9JouPnHavnEBlFwGvrLn qFvXVoLhXg9f0CXMBaTeZxo7K+H3jxIDXgSGXEcK6bTRrw4gQBCQGvPpzgp4Ekhab22/ /xeTbQiWzdBeX5aa5McYBS6h+Jyt2G3zrCERBC/ZXrYqMtqWAmYi0iq5qHNLTGZ8HxGG MvdHuWmoQpv+k1db0SJXw2a2UL76qVa320DL2nPnlQ2ggcvOGFQI1n1l52wr8/qbtLtr 0SWABgD7+nMVH3p91zANWFcWxQNPjoZlYxSHCePdmdHpr1qsMpNkpWM2uWb0Pq7eCPDz 6ljw== X-Received: by 10.202.97.130 with SMTP id v124mr442039oib.34.1423178262364; Thu, 05 Feb 2015 15:17:42 -0800 (PST) Original-Received: by 10.76.125.1 with HTTP; Thu, 5 Feb 2015 15:17:42 -0800 (PST) In-Reply-To: X-Google-Sender-Auth: vyx0dQAwkDGsEV2ePLT1-voSD34 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4003:c01::232 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182485 Archived-At: As for answering your questions: >> implementing it for users so it works like `case-fold-search' (you just >> set something in Customize and all search commands DWYM) seems much >> harder. Doing it as part of Emacs is not terribly hard, but it has disadvantages. Namely, the case-fold-search machinery only relates one character to another character (1 to 1). At least for latin this would be enough a lot of the time, e.g. you can use it to relate "=C3=A1" to "a". However, there's another way of writing "=C3=A1" which takes two characters, and this situation can't be handled (AFAIK) by the case-fold-search machinery. The bright side is that I think this two-char way of writing latin accents is much less common (not 100% sure though, it's hard to tell the difference). The downside is that I know nothing about other languages, so maybe using two chars to represent one char is the default behavior in some other languages? >> Does anyone have suggestions? Maybe some defadvice magic? You can use a defadvice around one of the isearch internal functions (check out the branch I mentioned) to implement something in elisp. And you can redefine the buffer's case-folding table and use that in the advice, but that will require that you generate the entire table.