From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#13041: 24.2; diacritic-fold-search Date: Sun, 09 Dec 2012 01:07:12 +0200 Organization: JURTA Message-ID: <87ip8cz2zu.fsf@mail.jurta.org> References: <20121130182205.C722F14B8D@panix1.panix.com> <87ip8fjzwn.fsf@gnu.org> <871uf2647i.fsf@mail.jurta.org> <50C1C6CC.9020103@gmx.at> <87ehj18l9p.fsf@mail.jurta.org> <50C322CC.1000806@gmx.at> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1355008848 25059 80.91.229.3 (8 Dec 2012 23:20:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 8 Dec 2012 23:20:48 +0000 (UTC) Cc: 13041@debbugs.gnu.org, perin@panix.com, perin@acm.org To: martin rudalics Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Dec 09 00:21:00 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ThThZ-00008Z-Uf for geb-bug-gnu-emacs@m.gmane.org; Sun, 09 Dec 2012 00:20:58 +0100 Original-Received: from localhost ([::1]:47166 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThThN-0007f9-KS for geb-bug-gnu-emacs@m.gmane.org; Sat, 08 Dec 2012 18:20:45 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:43502) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThThJ-0007X1-8C for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 18:20:43 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ThThG-0005py-T6 for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 18:20:41 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:51270) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ThThG-0005pr-QC for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 18:20:38 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1ThThh-0005TF-3O for bug-gnu-emacs@gnu.org; Sat, 08 Dec 2012 18:21:05 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 08 Dec 2012 23:21:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13041 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13041-submit@debbugs.gnu.org id=B13041.135500881320942 (code B ref 13041); Sat, 08 Dec 2012 23:21:05 +0000 Original-Received: (at 13041) by debbugs.gnu.org; 8 Dec 2012 23:20:13 +0000 Original-Received: from localhost ([127.0.0.1]:33284 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ThTgp-0005Rd-QR for submit@debbugs.gnu.org; Sat, 08 Dec 2012 18:20:13 -0500 Original-Received: from ps18281.dreamhost.com ([69.163.218.105]:33170 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ThTgl-0005RL-Vp for 13041@debbugs.gnu.org; Sat, 08 Dec 2012 18:20:08 -0500 Original-Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id D1FB1AAA49C9; Sat, 8 Dec 2012 15:19:39 -0800 (PST) In-Reply-To: <50C322CC.1000806@gmx.at> (martin rudalics's message of "Sat, 08 Dec 2012 12:21:48 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:68206 Archived-At: > This means that when you type the second "f" you might get a match > before the present one. Consider a buffer containing the two lines > su=EF=AC=80er > suffer > > Typing "suf" as search string would go to "suffer". Adding an "f" to > the search string now would go back to "su=EF=AC=80er" (or not). Going back looks like backtracking in the regexp search. OTOH, instead of using an approach of matching only a full match like in Chromium, we could do like GEdit and OpenOffice that match the whole ligature character in a partial match (i.e. to match "=EF=AC=80" when the search string is just "f"). Though this has a problem of highlighting the whole character for a partial match that looks wrong, but perhaps no one can do better. >> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt >> http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt > > Case folding "=C3=9F" to "SS" (upper case "S") is not what I had in min= d. I > was talking about the (weak?) equivalence of "=C3=9F" and "ss" (lower c= ase > "s") which is much more important when searching. In particular so, > because many German words that were earlier written with an "=C3=9F" ar= e now > written with "ss". Yes, this is what I meant too. It is surprising but http://www.unicode.org/Public/UNIDATA/CaseFolding.txt defines the equivalence of "=C3=9F" and "ss" (lower case "s") instead of case-folding. The following line in CaseFolding.txt: 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S maps 00DF (LATIN SMALL LETTER SHARP S) to two characters 0073 0073 (LATIN SMALL LETTER S) keeping the lower case. Maybe this is a bug in Unicode data?