From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Drew Adams" Newsgroups: gmane.emacs.bugs Subject: bug#13041: 24.2; diacritic-fold-search Date: Sat, 1 Dec 2012 08:38:45 -0800 Message-ID: <7DD994F3BDA241E19AFF870D8115AE51@us.oracle.com> References: <20121130182205.C722F14B8D@panix1.panix.com><87hao69b5r.fsf@mail.jurta.org><20665.8224.844876.619203@panix5.panix.com><87hao6zko4.fsf@mail.jurta.org> <83fw3qtboc.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1354379996 21383 80.91.229.3 (1 Dec 2012 16:39:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 1 Dec 2012 16:39:56 +0000 (UTC) Cc: perin@panix.com, 13041@debbugs.gnu.org, perin@acm.org To: "'Eli Zaretskii'" , "'Juri Linkov'" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 01 17:40:07 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Teq6n-0004IO-Pd for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Dec 2012 17:40:06 +0100 Original-Received: from localhost ([::1]:40859 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teq6c-0002GQ-6g for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Dec 2012 11:39:54 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:41281) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teq6Z-0002G8-7F for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 11:39:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Teq6Y-0001vT-3j for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 11:39:51 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:38461) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Teq6X-0001vP-WA for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 11:39:50 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Teq8f-0005t8-Jo for bug-gnu-emacs@gnu.org; Sat, 01 Dec 2012 11:42:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: "Drew Adams" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 01 Dec 2012 16:42:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13041 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13041-submit@debbugs.gnu.org id=B13041.135438007122573 (code B ref 13041); Sat, 01 Dec 2012 16:42:01 +0000 Original-Received: (at 13041) by debbugs.gnu.org; 1 Dec 2012 16:41:11 +0000 Original-Received: from localhost ([127.0.0.1]:48712 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Teq7q-0005s1-Gl for submit@debbugs.gnu.org; Sat, 01 Dec 2012 11:41:11 -0500 Original-Received: from aserp1040.oracle.com ([141.146.126.69]:27103) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Teq7o-0005ru-Ae for 13041@debbugs.gnu.org; Sat, 01 Dec 2012 11:41:09 -0500 Original-Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qB1GcrVZ008996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 1 Dec 2012 16:38:53 GMT Original-Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qB1GcqEh019397 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 1 Dec 2012 16:38:52 GMT Original-Received: from abhmt109.oracle.com (abhmt109.oracle.com [141.146.116.61]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qB1GcqYk002774; Sat, 1 Dec 2012 10:38:52 -0600 Original-Received: from dradamslap1 (/71.202.147.44) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 01 Dec 2012 08:38:52 -0800 X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <83fw3qtboc.fsf@gnu.org> Thread-Index: Ac3PnpvXpj5n5OR6SL2XgkfnbyJckQAQcTKg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:67734 Archived-At: > I don't understand why this thread is talking only about Latin > characters with diacritics. That is a special case of what Unicode > calls "compatibility equivalence" (q.e.). For example, even in the > Latin environments, don't you want to find "sni?" when searching for > "sniff", and vice versa? And there are similar issues in many > non-Latin scripts. Actually, in the original thread I made the same point. =20 Please see that discussion for this and other points. http://lists.gnu.org/archive/html/help-gnu-emacs/2012-11/msg00429.html > The decomposition of a character such as '?' is given by > the Unicode database... Emacs already supports these > decomposition properties. That's good news (new to me). So it sounds like even the most hopeful wanna-haves of the discussion could perhaps be realized without too much trouble. > Using these properties, every search string can be converted to a > sequence of non-decomposable characters (this process is recursive, > because the 'decomposition' property can use characters that > themselves are decomposable). If the user wants to ignore diacritics, > then the diacritics should be dropped from the decomposition sequence > before starting the search. E.g., for the decomposition of =E8 above, > we will drop the 768 and will be left with 101, which is 'e'. Then > searching for that string should apply the same decomposition > transformation to the text being searched, when comparing them. >=20 > This would be the most general way of solving this issue, a way that > is not limited to diacritics nor to Latin scripts. And doing that > will move Emacs closer to the goal of being Unicode compatible, since > support for this is required by the Unicode Standard. This sounds great. I really hope someone with the time and knowledge = adds such a feature soon (even though, to be clear, I personally do not have much = need for it). I think it would be very handy for many users - most welcome.