From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Artur Malabarba Newsgroups: gmane.emacs.devel Subject: Re: Single quotes in Info Date: Tue, 27 Jan 2015 23:15:22 -0200 Message-ID: References: <87twzhgk84.fsf@wmi.amu.edu.pl> <83lhksshdm.fsf@gnu.org> <9ee0c895-a178-40e1-b1c8-ed2b97071c6b@default> <87h9vgglkz.fsf@wmi.amu.edu.pl> <83h9vcp0bq.fsf@gnu.org> <83y4onorcc.fsf@gnu.org> Reply-To: bruce.connor.am@gmail.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a11c1ed5406ef98050dac1b3c X-Trace: ger.gmane.org 1422407741 32061 80.91.229.3 (28 Jan 2015 01:15:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 28 Jan 2015 01:15:41 +0000 (UTC) Cc: emacs-devel To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 28 02:15:40 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YGHEJ-0005T3-8m for ged-emacs-devel@m.gmane.org; Wed, 28 Jan 2015 02:15:39 +0100 Original-Received: from localhost ([::1]:50784 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGHEI-0004uU-0X for ged-emacs-devel@m.gmane.org; Tue, 27 Jan 2015 20:15:38 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54540) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGHE9-0004uN-Hn for emacs-devel@gnu.org; Tue, 27 Jan 2015 20:15:31 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YGHE8-0007Dr-9Q for emacs-devel@gnu.org; Tue, 27 Jan 2015 20:15:29 -0500 Original-Received: from mail-ob0-x234.google.com ([2607:f8b0:4003:c01::234]:38173) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGHE3-0007C9-39; Tue, 27 Jan 2015 20:15:23 -0500 Original-Received: by mail-ob0-f180.google.com with SMTP id uz6so16713067obc.11; Tue, 27 Jan 2015 17:15:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=XJh3adj9rPnOdRqVUbDzK0hE7rGMqbQUa4JWgwoF5ZA=; b=VewyMz8LtipPpjsRUcm5GQOn6O1hCd6WZdLZvYwtnHVtdtE7Q5rU683eiPh+oOfQvZ M5epAHXVbffzv5LfJ4Npenf2qa28GlsQRLGm+HXwg29EfKLiflHtfyH7bH67f8bW7TYf nidh3Fd7zsSMfwuFI8Fl8sNw/TBXeZCirorqqlGlDYCs90t/LtgUakuO5rwD77e9bvYW VLXYVchDYhq+YgOHxMFMkr/VXdIdQm4HF2C2Rn6YOATNw9WIU+phoopvuA7Y6y2Stc8b Ciz3+EuVQD647IZsYa7qlVv5QkhSnOGUaGsBT+NHNLo6Kmokkq4hOcaRLnn1/mA9BXdD CAlQ== X-Received: by 10.182.131.231 with SMTP id op7mr567224obb.46.1422407722526; Tue, 27 Jan 2015 17:15:22 -0800 (PST) Original-Received: by 10.76.125.1 with HTTP; Tue, 27 Jan 2015 17:15:22 -0800 (PST) In-Reply-To: <83y4onorcc.fsf@gnu.org> X-Google-Sender-Auth: -ReWdMNxdjzLhuhNiefRBlprpNM X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4003:c01::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:181865 Archived-At: --001a11c1ed5406ef98050dac1b3c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Eli, if I may ask, did you get a chance to see the code? (it's quite short) The last couple emails give me the impression we're not quite on the same page. On 27 Jan 2015 19:18, "Eli Zaretskii" wrote: > > > Date: Tue, 27 Jan 2015 18:24:09 -0200 > > From: Artur Malabarba > > Cc: Marcin Borkowski , emacs-devel < emacs-devel@gnu.org> > > > > > If this is implemented in isearch, then IMO doing it for quotes alone > > > makes very little sense. > > > > The quotes are just proof of concept. > > Yes, but what concept is that? Does it scale up to a general-purpose > feature of the kind that suits isearch.el? Just replacing one > character for another doesn't, IMO. No. It replaces one character with an arbitrary regexp. In the quotes case that's used to match about a dozen different quotation characters, but it's not limited to that. You can also use that to implement lax-whi > > > If we do this via our private database, that database is going to be > > > huge. > > > > Is it? I would expect something on the order of 50 lines. > > There are more than 5000 characters in the Unicode database that have > equivalence and canonical decompositions. (Look for entries in > UnicodeData.txt whose 6th field is non-empty.) The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these =EF=BC=82=E2=80=9C=E2=80=9D=E2=80=9D= =E2=80=9E=E2=B9=82=E3=80=9E=E2=80=9F=E2=80=9F=E2=9D=9E=E2=9D=9D=E2=9D=A0=E2= =80=9C=E2=80=9E=E3=80=9D=E3=80=9F=F0=9F=99=B7=F0=9F=99=B6=F0=9F=99=B8) by t= yping a simple character available on simple keyboards (such as the plain double quote "). Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked). This might be easier to understand looking at the code. > > > > We already have infrastructure for that, see > > > the description of the 'decomposition' character property in the ELis= p > > > manual. > > > > Building this on preexisting infrastructure would be great, but does that go > > the right way? Does it relate a simple character to all its complex > > equivalents? Or does it relate each complex character to a simple alternative? > The latter. Read paragraph 1.1 of UAX #15 for the starting point, and > also section 3.7 of the Unicode Standard. If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here. --001a11c1ed5406ef98050dac1b3c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Eli, if I may ask, did you get a chance to = see the code? (it's quite short)
The last couple emails give me the impression we're not quite on the sa= me page.

On 27 Jan 2015 19:18, "Eli Zaretskii" <eliz@gnu.org> wrote:
>
> > Date: Tue, 27 Jan 2015 18:24:09 -0200
> > From: Artur Malabarba <bruce.connor.am@gmail.com>
> > Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel <emacs-devel@gnu.org> > >
> > > If this is implemented in isearch, then IMO doing it for quo= tes alone
> > > makes very little sense.
> >
> > The quotes are just proof of concept.
>
> Yes, but what concept is that?=C2=A0 Does it scale up to a general-pur= pose
> feature of the kind that suits isearch.el?=C2=A0 Just replacing one > character for another doesn't, IMO.

No. It replaces one character with an arbitrary regexp. In t= he quotes case that's used to match about a dozen different quotation c= haracters, but it's not limited to that. You can also use that to imple= ment lax-whi

> > > If we do this via our private database, that = database is going to be
> > > huge.
> >
> > Is it? I would expect something on the order of 50 lines.
>
> There are more than 5000 characters in the Unicode database that have<= br> > equivalence and canonical decompositions.=C2=A0 (Look for entries in > UnicodeData.txt whose 6th field is non-empty.)

The purpose of this is to allow the user to search for compl= ex characters (such as curly quotes or any of these =EF=BC=82=E2=80=9C=E2= =80=9D=E2=80=9D=E2=80=9E=E2=B9=82=E3=80=9E=E2=80=9F=E2=80=9F=E2=9D=9E=E2=9D= =9D=E2=9D=A0=E2=80=9C=E2=80=9E=E3=80=9D=E3=80=9F=F0=9F=99=B7=F0=9F=99=B6=F0= =9F=99=B8) by typing a simple character available on simple keyboards (such= as the plain double quote "). Each simple character, needs an entry o= n the `isearch-groups-alist' variable. The max number of entries we'= ;ll ever need on this alist (in the very worst possible scenario) is the nu= mber of simple characters in a simple keyboard (which is way less than 5000= last I checked).

This might be easier to understand looking at the code= .

>
> > > We already have infrastructure for that, see
> > > the description of the 'decomposition' character pro= perty in the ELisp
> > > manual.
> >
> > Building this on preexisting infrastructure would be great, but d= oes that go
> > the right way? Does it relate a simple character to all its compl= ex
> > equivalents? Or does it relate each complex character to a simple= alternative?
> The latter.=C2=A0 Read paragraph 1.1 of UAX #15 for the starting point= , and
> also section 3.7 of the Unicode Standard.

If it's the latter, t= hen it's the wrong way for us to do an automated approach. What we need= is to know the whole set of Unicode characters which is equivalent to a gi= ven ASCII character. Of course we can build this table from the Unicode Sta= ndard (that's exactly what the `isearch-groups-alist' variable is m= eant to do), I'm just saying an automated approach probably isn't v= iable here.
--001a11c1ed5406ef98050dac1b3c--