From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.devel Subject: Re: Questions about isearch Date: Wed, 25 Nov 2015 21:49:58 +0000 Message-ID: References: <83lh9lx6oi.fsf@gnu.org> <83a8q1x1cn.fsf@gnu.org> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1140118e897a98052564704d X-Trace: ger.gmane.org 1448488235 29184 80.91.229.3 (25 Nov 2015 21:50:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 25 Nov 2015 21:50:35 +0000 (UTC) Cc: emacs-devel To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 25 22:50:33 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a1hxA-0003Lm-JB for ged-emacs-devel@m.gmane.org; Wed, 25 Nov 2015 22:50:16 +0100 Original-Received: from localhost ([::1]:47899 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1hxC-0007g2-61 for ged-emacs-devel@m.gmane.org; Wed, 25 Nov 2015 16:50:18 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49080) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1hww-0007fx-EU for emacs-devel@gnu.org; Wed, 25 Nov 2015 16:50:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a1hwv-0008VF-DK for emacs-devel@gnu.org; Wed, 25 Nov 2015 16:50:02 -0500 Original-Received: from mail-lf0-x233.google.com ([2a00:1450:4010:c07::233]:34668) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1hwt-0008UB-Bo; Wed, 25 Nov 2015 16:49:59 -0500 Original-Received: by lffu14 with SMTP id u14so76627455lff.1; Wed, 25 Nov 2015 13:49:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=XRYxnx9Y2re7g3AGW26IRlyq7vUbYLPIzwRvXjHv+Ys=; b=WhLvGZaB4moQ+BV3T8jcXwRLMp1mU3YcYyJfCIBE5Dt7ks9ACGGYQ627fXjZh/vT/G 74sulPpfaS2zWLHNQvkSzSuVBcHQpx8+oZlJ9QrWjS0S6XsA5Yu55U6E0k62uaOidwSL ibofEX7o9KXrjBtLOPFUuuxLXjVsNgUmvHFunRuVig5Y3iVZOqF6o4RERSP2Ry3OpyUZ 3LaoxLDPHEsBXi9YileYzKorLmCg8kN5AyvSbcJif613/1hN3oZ8bIh1ymJcZEmCxGNy 4y7YBCEMDBiqtJdZBPSNTHs//l56rafOQUc60ZKfVby3zW+ywYza0hgLX3+vbKVpoSev aXaQ== X-Received: by 10.25.208.206 with SMTP id h197mr16850316lfg.153.1448488198564; Wed, 25 Nov 2015 13:49:58 -0800 (PST) Original-Received: by 10.112.202.99 with HTTP; Wed, 25 Nov 2015 13:49:58 -0800 (PST) Original-Received: by 10.112.202.99 with HTTP; Wed, 25 Nov 2015 13:49:58 -0800 (PST) In-Reply-To: <83a8q1x1cn.fsf@gnu.org> X-Google-Sender-Auth: iCRDnIJnfDs33VzxWBs0lQTnv94 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4010:c07::233 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:195261 Archived-At: --001a1140118e897a98052564704d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 25 Nov 2015 8:36 pm, "Eli Zaretskii" wrote: > > > Date: Wed, 25 Nov 2015 20:14:06 +0000 > > From: Artur Malabarba > > Cc: emacs-devel > > > > > 1. Character folding doesn't catch ligatures, such as =C3=A6 (should = it match > > > the two characters "ae")? > > > > I've no idea. It would be easy to add. > > No, I meant to ask why it doesn't work already. AFAIU, the > decomposition of =EF=AC=80 is "ff": > > (get-char-code-property ?=EF=AC=80 'decomposition) > =3D> (compat 102 102) > > but searching for 'f' doesn't match the ligature. (=C3=A6 doesn't have a > decomposition in the Unicode database, so maybe it's a different > case.) I see. I thought this was a case of adding an adhoc rule. I'll have to look into it over the weekend to see why f doesn't match =EF= =AC=80. > > > 2. It also doesn't match =C3=A4 (a single character) with a=CC=88 (2 = characters, > > > which Emacs correctly composes into 1 grapheme cluster). Should it? > > > > Possibly. Since they look the same, might make things easier on users. But I > > wouldn't know as I've never seen the second version used anywhere. > > Once again, the decomposition attribute says we should match them: > > (get-char-code-property ?=C3=A4 'decomposition) > =3D> (97 776) > > and the second character in a=CC=88 is U+0308 =3D 776. Doesn't that say = we > should have matched them? That's different. Currently we use the decomposition attribute to decide that "a" should match =C3=A4. Our approach so far has been that searching = for the "easy to type" characters should match the "hard to type" characters, but searching for the "hard to type" characters will only match the character itself. So right now it is working as intended. We can (and I think we should) extend that last case so that searching for the "hard to type" characters will only match the character itself or its exact decomposition. --001a1140118e897a98052564704d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On 25 Nov 2015 8:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
>
> > Date: Wed, 25 Nov 2015 20:14:06 +0000
> > From: Artur Malabarba <bruce.connor.am@gmail.com>
> > Cc: emacs-devel <emacs-= devel@gnu.org>
> >
> > > 1. Character folding doesn't catch ligatures, such as = =C3=A6 (should it match
> > > the two characters "ae")?
> >
> > I've no idea. It would be easy to add.
>
> No, I meant to ask why it doesn't work already.=C2=A0 AFAIU, the > decomposition of =EF=AC=80 is "ff":
>
> =C2=A0 (get-char-code-property ?=EF=AC=80 'decomposition)
> =C2=A0 =C2=A0 =3D> (compat 102 102)
>
> but searching for 'f' doesn't match the ligature.=C2=A0 (= =C3=A6 doesn't have a
> decomposition in the Unicode database, so maybe it's a different > case.)

I see. I thought this was a case of adding an adhoc rule. I'll have to look into it over the weekend to see why f doesn't mat= ch =EF=AC=80.

> > > 2. It also doesn't match =C3=A4 (a single= character) with a=CC=88 (2 characters,
> > > which Emacs correctly composes into 1 grapheme cluster). Sho= uld it?
> >
> > Possibly. Since they look the same, might make things easier on u= sers. But I
> > wouldn't know as I've never seen the second version used = anywhere.
>
> Once again, the decomposition attribute says we should match them:
>
> =C2=A0 (get-char-code-property ?=C3=A4 'decomposition)
> =C2=A0 =C2=A0 =3D> (97 776)
>
> and the second character in a=CC=88 is U+0308 =3D 776.=C2=A0 Doesn'= ;t that say we
> should have matched them?

That's different. Currently we use the decomposition att= ribute to decide that "a"=C2=A0 should match =C3=A4. Our approach= so far has been that searching for the "easy to type" characters= should match the "hard to type" characters, but searching for th= e "hard to type" characters will only match the character itself.= So right now it is working as intended.

We can (and I think we should) extend that last case so that= searching for the "hard to type" characters will only match the = character itself or its exact decomposition.

--001a1140118e897a98052564704d--