all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Artur Malabarba <bruce.connor.am@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Single quotes in Info
Date: Wed, 28 Jan 2015 19:38:08 -0200	[thread overview]
Message-ID: <CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com> (raw)
In-Reply-To: <83vbjrnd1f.fsf@gnu.org>

I've been looking into what you suggest, but it seems the
decomposition property won't be enough. It does give us the necessary
information for things like á and ç, but it doesn't say anything about
the quotes (which was the whole inital point), nor about characters
like ⇒ (which I think someone else on this thread suggested).

Furthermore, the point here would be to have "a" and "á" match each
other, but the decomposition of "á" gives us two characters (as would
be expected). How are we to programmatically know which of these two
characters is to be considered equivalent to "a with accute"? Is it
safe to assume it's the first character?
Otherwise, if we demand that the user types a´ to be able to match the
á letter, then this feature seems kind of moot.

2015-01-28 13:24 GMT-02:00 Eli Zaretskii <eliz@gnu.org>:
>> Date: Tue, 27 Jan 2015 23:15:22 -0200
>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>> Cc: emacs-devel <emacs-devel@gnu.org>
>>
>> Eli, if I may ask, did you get a chance to see the code? (it's quite short)
>> The last couple emails give me the impression we're not quite on the same page.
>
> I did just now, and I don't think I was on a different page.
>
>> The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote ").
>
> But that's exactly where it falls short of supporting a more general
> feature, which allows to find text that is "equivalent" to the one you
> search for.  The limitation to "simple characters available on simple
> keyboards" might seem a no-brainer for predominantly ASCII text, but
> it _is_ a serious limitation for any non-ASCII script, certainly for
> complex scripts, which Emacs supports for years.
>
>> Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked).
>
> You seem to forget that modern keyboards and input methods support
> much more than what meets the eye on the keyboard.  Even Latin locales
> provide non-ASCII characters such as á and å.  It is also not uncommon
> to copy/paste a search string from some text, in which case the search
> string could include the "complex" characters, but you'd still want to
> find their "simple" equivalents; your code, which transforms only the
> search string, cannot support this use case.  Moreover, CJK locales
> use input methods that can produce thousands of characters, and for
> people in those cultures such input is "simple" because they can use
> nothing simpler.
>
> Using a database that maps ASCII characters to regexps doesn't scale
> for supporting these use cases.  It doesn't even scale to the
> above-mentioned Latin characters, because á has a sequence of 2
> characters "a ́" as its canonical decomposition, so when I type á, I
> expect to find both á and "a ́", and vice versa.  More complex scripts
> have several forms of the same letter, such as the "final" form used
> in Arabic and Hebrew for the last letter in a word -- typing one of
> these forms should find any other form.  Etc. etc. -- there's a huge
> complexity behind all this, and we need to support it if we want to be
> respected as a text editor.
>
> The way to support this is similar to how we support case-insensitive
> search: we "fold" each character, both in the search string and in the
> text being searched, using case tables, and then compare the "folded"
> characters.  Similarly, to support equivalence, we need to produce a
> canonical/equivalent decomposition from each character on both sides
> of the comparison, and then compare the results.
>
> As I said before, we already have all the necessary data in the
> 'decomposition' property of each character, we just need to use it in
> a way that is similar to case tables, just slightly more complex
> (because we are no longer talking single characters).
>
>> > > Does it relate a simple character to all its complex
>> > > equivalents? Or does it relate each complex character to a simple alternative?
>> > The latter.  Read paragraph 1.1 of UAX #15 for the starting point, and
>> > also section 3.7 of the Unicode Standard.
>> If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here.
>
> I don't see why it won't be viable, or maybe I don't understand what
> you mean by "automated" here.  I certainly don't think we should limit
> ourselves to "simple characters", not for something as general-purpose
> as text search.  This might be okay for Info only, but not if we want
> it in isearch.el.
>
> My idea is to use the 'decomposition' property to decompose each
> character in the search string and in the text being searched, when
> they need to be compared.  Exactly like we do with case-folding.



  parent reply	other threads:[~2015-01-28 21:38 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 23:17 Single quotes in Info Marcin Borkowski
2015-01-23 23:53 ` Drew Adams
2015-01-24 17:01   ` Marcin Borkowski
2015-01-24  8:38 ` Eli Zaretskii
2015-01-24 15:11   ` Drew Adams
2015-01-24 15:19     ` Eli Zaretskii
     [not found]     ` <<838ugsrysw.fsf@gnu.org>
2015-01-24 15:54       ` Drew Adams
2015-01-24 16:45         ` Marcin Borkowski
2015-01-24 17:00     ` Marcin Borkowski
2015-01-27 16:27       ` Artur Malabarba
2015-01-27 17:37         ` Stefan Monnier
2015-01-27 18:09           ` Eli Zaretskii
2015-01-27 19:00             ` Stefan Monnier
2015-01-27 19:15               ` Eli Zaretskii
2015-01-27 19:49           ` Artur Malabarba
2015-01-27 20:30             ` Stefan Monnier
2015-01-28  3:48               ` Stefan Monnier
2015-01-28 21:42                 ` Artur Malabarba
2015-01-28 22:23                   ` Stefan Monnier
2015-01-29 14:31                     ` Artur Malabarba
2015-01-27 18:04         ` Eli Zaretskii
2015-01-27 18:39           ` Drew Adams
2015-01-27 20:24           ` Artur Malabarba
2015-01-27 21:18             ` Eli Zaretskii
2015-01-28  1:15               ` Artur Malabarba
2015-01-28 15:24                 ` Eli Zaretskii
2015-01-28 16:10                   ` Yuri Khan
2015-01-28 17:22                     ` Eli Zaretskii
2015-01-28 21:38                   ` Artur Malabarba [this message]
2015-01-29  3:44                     ` Eli Zaretskii
2015-01-29  6:01                       ` Drew Adams
2015-01-29 16:03                         ` Eli Zaretskii
2015-01-29 16:24                           ` Drew Adams
2015-01-29 16:57                             ` Eli Zaretskii
     [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>
2015-01-26  3:26   ` Unicode in emacs (was Single quotes in Info) Rusi
     [not found] <<87twzhgk84.fsf@wmi.amu.edu.pl>
     [not found] ` <<83lhksshdm.fsf@gnu.org>
     [not found]   ` <<9ee0c895-a178-40e1-b1c8-ed2b97071c6b@default>
     [not found]     ` <<87h9vgglkz.fsf@wmi.amu.edu.pl>
     [not found]       ` <<CAAdUY-J4s+1_C7bj32Xk5x8d01fe9baPCYmwd+0KU=QorO7wZg@mail.gmail.com>
     [not found]         ` <<83h9vcp0bq.fsf@gnu.org>
     [not found]           ` <<CAAdUY-Kck6moHTRJshbXJdRVQ6gK6Q24f_PD7SuEaZ7hURpdQw@mail.gmail.com>
     [not found]             ` <<83y4onorcc.fsf@gnu.org>
     [not found]               ` <<CAAdUY-+ooLydD-qPtiEvv-01TGxX5E-cf6asvs+Jn+eR_=38ig@mail.gmail.com>
     [not found]                 ` <<83vbjrnd1f.fsf@gnu.org>
     [not found]                   ` <<CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com>
     [not found]                     ` <<83386untcd.fsf@gnu.org>
     [not found]                       ` <<ee612423-67bf-42d0-a0ef-0dad11605c49@default>
     [not found]                         ` <<83vbjpmv4w.fsf@gnu.org>
     [not found]                           ` <<6164d89d-23ac-46bf-9f84-154cc0e6c6e4@default>
     [not found]                             ` <<83mw51msnz.fsf@gnu.org>
2015-01-29 17:05                               ` Single quotes in Info Drew Adams
2015-01-29 17:24                                 ` Eli Zaretskii
2015-01-29 18:34                                   ` Drew Adams
2015-01-29 18:54                                     ` Eli Zaretskii
2015-01-29 19:35                                       ` Drew Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com \
    --to=bruce.connor.am@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.