From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Upcoming loss of usability of Emacs source files and Emacs. Date: Thu, 18 Jun 2015 10:41:02 +0300 Message-ID: <83twu5lb01.fsf@gnu.org> References: <20150615142237.GA3517@acm.fritz.box> <87y4jkhqh5.fsf@uwakimon.sk.tsukuba.ac.jp> <557F3C22.4060909@cs.ucla.edu> <5580D356.4050708@cs.ucla.edu> <87si9qonxb.fsf@gnu.org> <87ioamz8if.fsf@petton.fr> <32013464-2300-46c6-ba46-4a3c36bfee5d@default> <87twu62nnt.fsf@mbork.pl> <21890.22217.610318.184683@a1i15.kph.uni-mainz.de> <83vbellgb7.fsf@gnu.org> <21890.28246.59575.281588@a1i15.kph.uni-mainz.de> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1434613312 8686 80.91.229.3 (18 Jun 2015 07:41:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 18 Jun 2015 07:41:52 +0000 (UTC) Cc: eggert@cs.ucla.edu, rms@gnu.org, nicolas@petton.fr, emacs-devel@gnu.org, tsdh@gnu.org, acm@muc.de, stephen@xemacs.org To: Ulrich Mueller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 18 09:41:41 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z5US3-0002cD-Te for ged-emacs-devel@m.gmane.org; Thu, 18 Jun 2015 09:41:32 +0200 Original-Received: from localhost ([::1]:50598 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5US3-0001Nl-1S for ged-emacs-devel@m.gmane.org; Thu, 18 Jun 2015 03:41:31 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60717) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5URp-0001Nb-A4 for emacs-devel@gnu.org; Thu, 18 Jun 2015 03:41:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5URl-0002Y0-4B for emacs-devel@gnu.org; Thu, 18 Jun 2015 03:41:17 -0400 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:55411) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5URk-0002XN-SI; Thu, 18 Jun 2015 03:41:13 -0400 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0NQ400E00QL50O00@a-mtaout21.012.net.il>; Thu, 18 Jun 2015 10:41:11 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ400DLPQOLYA50@a-mtaout21.012.net.il>; Thu, 18 Jun 2015 10:41:10 +0300 (IDT) In-reply-to: <21890.28246.59575.281588@a1i15.kph.uni-mainz.de> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.169 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:187268 Archived-At: > Date: Thu, 18 Jun 2015 09:08:06 +0200 > Cc: mbork@mbork.pl, eggert@cs.ucla.edu, rms@gnu.org, nicolas@petton= .fr, > emacs-devel@gnu.org, tsdh@gnu.org, acm@muc.de, stephen@xema= cs.org > From: Ulrich Mueller >=20 > >>>>> On Thu, 18 Jun 2015, Eli Zaretskii wrote: >=20 > >> ;; Ignore accent and umlaut marks when searching. > >> ;; Works for Emacs 19.30 and later. > >> (let ((eqv-list '("aA=C3=A0=C3=80=C3=A1=C3=81=C3=A2=C3=82=C3= =A3=C3=83=C3=A4=C3=84=C3=A5=C3=85" > >> =09=09 "cC=C3=A7=C3=87" > >> =09=09 "eE=C3=A8=C3=88=C3=A9=C3=89=C3=AA=C3=8A=C3=AB=C3=8B" > >> =09=09 "iI=C3=AC=C3=8C=C3=AD=C3=8D=C3=AE=C3=8E=C3=AF=C3=8F" > >> =09=09 "nN=C3=B1=C3=91" > >> =09=09 "oO=C3=B2=C3=92=C3=B3=C3=93=C3=B4=C3=94=C3=B5=C3=95=C3= =B6=C3=96=C3=B8=C3=98" > >> =09=09 "uU=C3=B9=C3=99=C3=BA=C3=9A=C3=BB=C3=9B=C3=BC=C3=9C" > >> =09=09 "yY=C3=BD=C3=9D=C3=BF")) > >> (table (standard-case-table)) > >> canon) > >> (setq canon (copy-sequence table)) > >> (mapcar (lambda (s) > >> =09 (mapcar (lambda (c) (aset canon c (aref s 0))) s)) > >> =09 eqv-list) > >> (set-char-table-extra-slot table 1 canon) > >> (set-char-table-extra-slot table 2 nil) > >> (set-standard-case-table table)) Btw, the above doesn't work at all for me in Emacs 25: searching for 'a' doesn't find the variants with diacriticals. Maybe I didn't use it correctly -- is something else required beyond evaluating the expression and making sure I-search does a case-insensitive search? > > Also, this doesn't handle decomposed characters, as in 'a=CC=8A'.= So this > > is not really Unicode-compliant, it's a half-measure of sorts. >=20 > The above code snippet predates Unicode Emacs, so you cannot expect= it > to handle NFC and NFD and other intricacies of Unicode normalisatio= n. > (Also I've never seen anything else than the NFC forms, e.g., for > German umlauts, in the texts that I usually edit.) Mac OS X's HFS filesystem holds file names in NFD, AFAIK. And diacriticals are only the tip of the iceberg. E.g., when you search for 'n', won't you want to find '=E2=81=BF' and '=F0=9F=84= =9D' as well, at least sometimes, and likewise with '=C2=B2' and '=E2=92=89' and '=F0=9F= =84=83' when looking for '2'? These require support for compatibility decompositions, not just for canonical decompositions as in the case of diacriticals. > BTW, also isearch-forward doesn't match a=CC=8A when searching for = =C3=A5, and > vice versa. So by your above argument, search in Emacs isn't Unicod= e > compliant anyway. Of course, Emacs isn't Unicode-compliant -- this is why I said this feature is sorely needed, and that your proposal is a half-measure. > (But not sure if it should be, because I think that this would brea= k > Boyer-Moore.) It's already broken for multibyte characters anyway. And yes, handling equivalence in searching complicates the algorithm even more= , but that's a necessary payment for the extended functionality.