From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: martin rudalics Newsgroups: gmane.emacs.bugs Subject: bug#13041: 24.2; diacritic-fold-search Date: Sat, 08 Dec 2012 12:21:48 +0100 Message-ID: <50C322CC.1000806@gmx.at> References: <20121130182205.C722F14B8D@panix1.panix.com> <87ip8fjzwn.fsf@gnu.org> <871uf2647i.fsf@mail.jurta.org> <50C1C6CC.9020103@gmx.at> <87ehj18l9p.fsf@mail.jurta.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1354965769 20053 80.91.229.3 (8 Dec 2012 11:22:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 8 Dec 2012 11:22:49 +0000 (UTC) Cc: 13041@debbugs.gnu.org, perin@panix.com, perin@acm.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Dec 08 12:23:02 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ThIUh-0008KP-Mz for geb-bug-gnu-emacs@m.gmane.org; Sat, 08 Dec 2012 12:22:55 +0100 Original-Received: from localhost ([::1]:38626 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThIUV-0001Xr-Bb for geb-bug-gnu-emacs@m.gmane.org; Sat, 08 Dec 2012 06:22:43 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:60214) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThIUS-0001Xi-4X for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 06:22:41 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ThIUR-0000zR-00 for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 06:22:40 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:49658) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThIUQ-0000zN-T8 for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 06:22:38 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1ThIUo-0004tX-DZ for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 06:23:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: martin rudalics Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 08 Dec 2012 11:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13041 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13041-submit@debbugs.gnu.org id=B13041.135496574318769 (code B ref 13041); Sat, 08 Dec 2012 11:23:02 +0000 Original-Received: (at 13041) by debbugs.gnu.org; 8 Dec 2012 11:22:23 +0000 Original-Received: from localhost ([127.0.0.1]:59909 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ThIUB-0004sg-GK for submit@debbugs.gnu.org; Sat, 08 Dec 2012 06:22:23 -0500 Original-Received: from mailout-de.gmx.net ([213.165.64.22]:54130) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1ThIU9-0004sY-Dk for 13041@debbugs.gnu.org; Sat, 08 Dec 2012 06:22:22 -0500 Original-Received: (qmail invoked by alias); 08 Dec 2012 11:21:56 -0000 Original-Received: from 62-47-55-242.adsl.highway.telekom.at (EHLO [62.47.55.242]) [62.47.55.242] by mail.gmx.net (mp034) with SMTP; 08 Dec 2012 12:21:56 +0100 X-Authenticated: #14592706 X-Provags-ID: V01U2FsdGVkX18TAKQ6w3cTELhfMybYBiM0IM4AijNWZPhA8CpXoO SRhym0njx5W8Ae In-Reply-To: <87ehj18l9p.fsf@mail.jurta.org> X-Y-GMX-Trusted: 0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:68181 Archived-At: >> - leave the text alone but give each string that should be handled >> specially a text property with the normalized form. In this case >> searching has to pay attention to these properties, if present. >> >> - normalize the text and give each normalized string a text property >> with the original text. In this case searching will proceed as usu= al >> but you have to restore the original text when done. > > This reminds an idea that searching should take into account the text > displayed with the `display' property and other display-related proper= ties. > It seems this is more difficult to implement. =2E.. and probably should include searching for overlays too. >> Also I don't know how to handle the return value and/or highlighting >> when, for example, finding a match for "suf" within "su=EF=AC=80er". = For >> example, replacing each occurrence of "suf" with the empty string sho= uld >> leave us with "fer" here. > > I believe such ligature characters should be handled as a whole, > i.e. "suf" doesn't match "su=EF=AC=80er", only "suff" should match it.= This means that when you type the second "f" you might get a match before the present one. Consider a buffer containing the two lines su=EF=AC=80er suffer Typing "suf" as search string would go to "suffer". Adding an "f" to the search string now would go back to "su=EF=AC=80er" (or not). Disconc= erting in any case. >> I have no idea how many mappings like "=C3=9F" -> "ss" exist. The pr= oblem is >> that we don't get them from UnicodeData.txt IIUC. > > I can't find them in UnicodeData.txt too. Looking at the files in > http://www.unicode.org/Public/UNIDATA/ can find them in the file > > http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt > > that is derived from > > http://www.unicode.org/Public/UNIDATA/CaseFolding.txt > http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt Case folding "=C3=9F" to "SS" (upper case "S") is not what I had in mind.= I was talking about the (weak?) equivalence of "=C3=9F" and "ss" (lower cas= e "s") which is much more important when searching. In particular so, because many German words that were earlier written with an "=C3=9F" are = now written with "ss". martin