From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: ASCII-folded search [was: Re: Upcoming loss of usability ...] Date: Thu, 18 Jun 2015 11:46:44 +0300 Message-ID: <83oakdl7yj.fsf@gnu.org> References: <20150615142237.GA3517@acm.fritz.box> <87y4jkhqh5.fsf@uwakimon.sk.tsukuba.ac.jp> <557F3C22.4060909@cs.ucla.edu> <5580D356.4050708@cs.ucla.edu> <87si9qonxb.fsf@gnu.org> <87ioamz8if.fsf@petton.fr> <32013464-2300-46c6-ba46-4a3c36bfee5d@default> <87twu62nnt.fsf@mbork.pl> <87oakdfwim.fsf@uwakimon.sk.tsukuba.ac.jp> <83wpz1lh7c.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1434617849 15602 80.91.229.3 (18 Jun 2015 08:57:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 18 Jun 2015 08:57:29 +0000 (UTC) Cc: stephen@xemacs.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: bruce.connor.am@gmail.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 18 10:57:14 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z5Vd0-00043l-L8 for ged-emacs-devel@m.gmane.org; Thu, 18 Jun 2015 10:56:54 +0200 Original-Received: from localhost ([::1]:51068 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5Vd0-0005Ks-5A for ged-emacs-devel@m.gmane.org; Thu, 18 Jun 2015 04:56:54 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56471) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5VTO-0006V6-BU for emacs-devel@gnu.org; Thu, 18 Jun 2015 04:46:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5VTK-0004kz-RO for emacs-devel@gnu.org; Thu, 18 Jun 2015 04:46:58 -0400 Original-Received: from mtaout25.012.net.il ([80.179.55.181]:57245) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5VTK-0004ke-D5 for emacs-devel@gnu.org; Thu, 18 Jun 2015 04:46:54 -0400 Original-Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il (HyperSendmail v2007.08) id <0NQ400300TFUMF00@mtaout25.012.net.il> for emacs-devel@gnu.org; Thu, 18 Jun 2015 11:42:29 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout25.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ400IXSTISSKA0@mtaout25.012.net.il>; Thu, 18 Jun 2015 11:42:29 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.181 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:187274 Archived-At: > Date: Thu, 18 Jun 2015 09:16:16 +0100 > From: Artur Malabarba > Cc: "Stephen J. Turnbull" , emacs-devel ,=20 > =09Stefan Monnier >=20 > If we're bringing this up again, I kindly suggest that people have = a > look a previous thread here called "Character group folding in > searches". Here's the link: > https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00362.htm= l >=20 > It offers 3 implementations. I think the second > (group-folding-with-case-table-lisp.patch) was quite adequade. > The only reason it wasn't applied is that Eli and Stefan wanted > something that also matched non-spacing marks, or composite > characters, whereas case tables can only map single characters to > single characters. >=20 > The first implementation (group-folding-with-regexp-lisp.patch) did= n't > have this caveat because it didn't use this case-tables. The only > caveat it had is that it only works for non-regexp searches. >=20 > Attached are the two patches I mentioned here. Perhaps we should install the first one, it at least solves part of the problem. But I think it has a subtle bug: it assumes that get-char-code-property returns value of the form '(SOMETHING CH1 ...)= ', where CH1 is necessarily the character we want. However, that is false at least in some cases, for example: (get-char-code-property ?=F0=9F=84=9D 'decomposition) =3D> (compat = 40 78 41) So I think perhaps we should subject the return value of get-char-code-property to a few more tests, and either reject such values, or maybe find the first character that matches [:alnum:], and use that. Btw, can the other patch be installed _in_addition_ to the first one? IOW, they are orthogonal, or could be that, right? Thanks.