From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character group folding in searches Date: Sat, 07 Feb 2015 10:38:04 +0200 Message-ID: <83386ice1f.fsf@gnu.org> References: <83zj8rcdpi.fsf@gnu.org> <83k2zudfqk.fsf@gnu.org> <83d25md8k5.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1423298299 31709 80.91.229.3 (7 Feb 2015 08:38:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 7 Feb 2015 08:38:19 +0000 (UTC) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: bruce.connor.am@gmail.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 07 09:38:15 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YK0u6-00060c-QU for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 09:38:14 +0100 Original-Received: from localhost ([::1]:51728 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK0u6-0004BE-17 for ged-emacs-devel@m.gmane.org; Sat, 07 Feb 2015 03:38:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47015) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK0u2-0004B9-MS for emacs-devel@gnu.org; Sat, 07 Feb 2015 03:38:11 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YK0ty-0007WK-ML for emacs-devel@gnu.org; Sat, 07 Feb 2015 03:38:10 -0500 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:59397) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YK0ty-0007WA-Dg for emacs-devel@gnu.org; Sat, 07 Feb 2015 03:38:06 -0500 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0NJE005007X0JM00@a-mtaout21.012.net.il> for emacs-devel@gnu.org; Sat, 07 Feb 2015 10:38:04 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NJE005DH7ZGIF20@a-mtaout21.012.net.il>; Sat, 07 Feb 2015 10:38:04 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.169 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182571 Archived-At: > Date: Fri, 6 Feb 2015 22:08:19 +0000 > From: Artur Malabarba > Cc: Stefan Monnier , emacs-devel >=20 > >> > Because the other way you cannot use char-tables. And because > >> > matching "a" and "=C3=A1" will be hard the other way. > >> > >> Maybe I'm missing something, but if you have "=C3=A1" expand to = "a=C2=B4", it > >> won't match "a", will it? > > > > It will, if you only pay attention to the base character. >=20 > If you have the possibility of only paying attention to the base > character (if the machinery is in place) then there's no reason to > fold "=C3=A1" into "a=C2=B4" (folding 1 char into many). >=20 > Just fold everything into "a". Then (by only paying attention to th= e > base character) "=C3=A1" and "a=CC=81" will match, because "=C3= =A1" folds into "a" > which is the base character of "a=CC=81". But we need both capabilities, since whether or not a match of the base character is enough depends on what the caller/user wants. Folding everything into the base character supports only part of thos= e features. As the simplest example, how can you have "=C3=A1" and "a=C2=B4" matc= h, but "=C3=A1" and "a=C2=A8" fail to match, if you _only_ look at the base character= ? (Btw, using =C2=B4 and =C2=A8 here is incorrect, the correct characte= rs are their combining variants, u+0301 and u+0308; I left the ones you used just for clarity, to prevent Emacs from composing a and the following combining character.) And then there are more complex examples, like "q=CC=87=CC=A3" that s= hould match "q=CC=A3=CC=87" (because the ordering of combining marks doesn't matt= er), etc. What this tells to me is that we do need to fold "=C3=A1" into "a= =C2=B4", and then use a comparison function that pays attention to the "folding options" specified by the caller, to decide which parts of the folded sequence to ignore, and also how to compare the non-ignored parts (e.g., with some options the order of non-base characters should not matter).