From: Artur Malabarba <bruce.connor.am@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Character group folding in searches
Date: Fri, 6 Feb 2015 14:18:39 -0200 [thread overview]
Message-ID: <CAAdUY-L0E5XQwSAwRrvU-4VtmYCwNKyHB8y5EaoUXtOctE4HBw@mail.gmail.com> (raw)
In-Reply-To: <83zj8rcdpi.fsf@gnu.org>
> The full set of "folding" transformations is described in the Unicode
> technical report UTR #30. It was withdrawn, but its last draft is
> still enlightening.
>
> I think we should support some subset of what's described there.
>
> The way to do it IMO is to generate a set of char-tables where each
> character is mapped to its folded variant,
> one char-table for each subset of folding.
Although the attached patches only define one table for now, they all
support multiple tables (even the one that's not based on char-tables)
so the sky is the limit. For this reason, this detail probably won't
be an obstacle so we can decide later which subset of foldings we want
to provide by default.
> A character whose folding is not a single
> character should map to a vector or a string of characters (not sure
> which one is best, we should choose the one that lends itself to the
> most efficient use).
> I think the best approach is to modify search.c to be able to handle
> folding that produces more than a single character. I think we will
> also need search.c to support several alternative foldings for the
> same search operation. Making these changes would be relatively easy,
It's certainly doable, but I'm not sure it's easy. The `search_buffer'
function seems pretty focused on handling 1 char at time. Having a
single char suddenly turn into two might require significant changes
to the code flow.
Of course, if someone takes that up that's great!
>> * group-folding-with-regexp-lisp.patch
>>
>> This one takes each input character and either keeps it verbatim or
>> transform it into a regexp which matches the entire group that this
>> character represents. It is implemented in isearch.
>>
>> + It trivially handles goals 1, 2 and 3. Because regexps are quite
>> versatile, it is the only solution that handles item 3 (it allows each
>> character to match more than a single character).
>
> But the downside is that we will have to construct such regexps for
> all the foldings of all the characters we want to support. That will
> be quite a large database, and a lot of work to construct it.
It's only a tiny bit more work than generating case-tables that are
also under discussion. Any information available to construct the case
tables is also available for building the regexps.
>> * group-folding-with-case-table-lisp.patch
>>
>> This patch is entirely in elisp. I've put it all inside `isearch.el'
>> for now, for the sake of simplicity, but it's not restricted to
>> isearch.
>>
>> It creates a new case-table which performs group folding by borrowing
>> the case-folding machinery, so it is very fast. Then, group folding
>> can be achieved by running the search inside a `with-group-folding`
>> macro. There's also an example implementation which turns it on for
>> isearch by default.
>>
>> + It immediately satisfies items 1, 2, 4, and 5.
>> + It is very fast.
>> - It has no simple way of achieving item 3.
>
> It could use a separate case-table for item 3, couldn't it?
Not that I can tell. You either need to tell emacs to either (1)
ignore the accute entirely, or (2) have the "a´" pair of characters
fold into "a". Case tables just can't do this right now AFAIK.
> I think we will need separate tables for different foldings anyway,
> because each use case calls for some specific folding. In isearch,
> the user will have to specify which foldings she wants to be in
> effect.
Yes, multiple tables are fine and will be done regardless of the approach taken.
>> - If the user decides to set `group-fold-search' to t, this can break
>> existing code (a disadvantage that the lisp version above does not
>> have).
>> - It adds two extra fields to every buffer object (the boolean
>> variable and the char table).
>
> I'm not sure we need to add these tables to the buffer object. The
> experience with using case-tables this way is not encouraging, because
> in several important cases it is not at all clear which buffer is
> relevant to the folding-match operation one needs to do.
Yes, I don't like this either. I was threading unknown waters here, so
I just tried to stays as close as possible to what case-fold-search
does.
>> Do any of these options seem good enough? Which would you all like to explore?
>> I like the second one best, but goal 3 is quite important.
>
> I think we must lift the limitation of single-character folding
> result, which means changes on the C level are inevitable.
I agree this is important. But if no one takes it up I'd rather have
single-character folding than none at all.
> I also think we need to talk a bit more about which kinds of folding
> we would like to support.
What do you mean? Which folding subsets to provide by default?
next prev parent reply other threads:[~2015-02-06 16:18 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18 ` Artur Malabarba [this message]
2015-02-06 16:44 ` Eli Zaretskii
2015-02-06 18:03 ` Stefan Monnier
2015-02-06 19:03 ` Eli Zaretskii
2015-02-06 19:27 ` Artur Malabarba
2015-02-06 21:38 ` Eli Zaretskii
2015-02-06 22:08 ` Artur Malabarba
2015-02-07 8:38 ` Eli Zaretskii
2015-02-06 19:41 ` Stefan Monnier
2015-02-06 21:43 ` Eli Zaretskii
2015-02-07 0:05 ` Stefan Monnier
2015-02-07 8:47 ` Eli Zaretskii
2015-02-07 15:02 ` Stefan Monnier
2015-02-07 15:31 ` Eli Zaretskii
2015-02-08 14:03 ` Stefan Monnier
2015-02-08 19:12 ` Eli Zaretskii
2015-02-09 3:03 ` Stefan Monnier
2015-02-09 15:40 ` Eli Zaretskii
2015-02-09 16:33 ` Stefan Monnier
2015-02-09 17:39 ` Eli Zaretskii
2015-02-10 2:15 ` Stefan Monnier
2015-02-10 15:45 ` Eli Zaretskii
2015-02-07 0:07 ` Juri Linkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAdUY-L0E5XQwSAwRrvU-4VtmYCwNKyHB8y5EaoUXtOctE4HBw@mail.gmail.com \
--to=bruce.connor.am@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.