Re: Character group folding in searches

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Artur Malabarba <bruce.connor.am@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Character group folding in searches
Date: Fri, 6 Feb 2015 14:18:39 -0200	[thread overview]
Message-ID: <CAAdUY-L0E5XQwSAwRrvU-4VtmYCwNKyHB8y5EaoUXtOctE4HBw@mail.gmail.com> (raw)
In-Reply-To: <83zj8rcdpi.fsf@gnu.org>

> The full set of "folding" transformations is described in the Unicode
> technical report UTR #30.  It was withdrawn, but its last draft is
> still enlightening.
>
> I think we should support some subset of what's described there.
>
> The way to do it IMO is to generate a set of char-tables where each
> character is mapped to its folded variant,
> one char-table for each subset of folding.

Although the attached patches only define one table for now, they all
support multiple tables (even the one that's not based on char-tables)
so the sky is the limit. For this reason, this detail probably won't
be an obstacle so we can decide later which subset of foldings we want
to provide by default.

> A character whose folding is not a single
> character should map to a vector or a string of characters (not sure
> which one is best, we should choose the one that lends itself to the
> most efficient use).
> I think the best approach is to modify search.c to be able to handle
> folding that produces more than a single character.  I think we will
> also need search.c to support several alternative foldings for the
> same search operation.  Making these changes would be relatively easy,

It's certainly doable, but I'm not sure it's easy. The `search_buffer'
function seems pretty focused on handling 1 char at time. Having a
single char suddenly turn into two might require significant changes
to the code flow.

Of course, if someone takes that up that's great!

>> * group-folding-with-regexp-lisp.patch
>>
>> This one takes each input character and either keeps it verbatim or
>> transform it into a regexp which matches the entire group that this
>> character represents. It is implemented in isearch.
>>
>> + It trivially handles goals 1, 2 and 3. Because regexps are quite
>> versatile, it is the only solution that handles item 3 (it allows each
>> character to match more than a single character).
>
> But the downside is that we will have to construct such regexps for
> all the foldings of all the characters we want to support.  That will
> be quite a large database, and a lot of work to construct it.

It's only a tiny bit more work than generating case-tables that are
also under discussion. Any information available to construct the case
tables is also available for building the regexps.


>> * group-folding-with-case-table-lisp.patch
>>
>> This patch is entirely in elisp. I've put it all inside `isearch.el'
>> for now, for the sake of simplicity, but it's not restricted to
>> isearch.
>>
>> It creates a new case-table which performs group folding by borrowing
>> the case-folding machinery, so it is very fast. Then, group folding
>> can be achieved by running the search inside a `with-group-folding`
>> macro. There's also an example implementation which turns it on for
>> isearch by default.
>>
>> + It immediately satisfies items 1, 2, 4, and 5.
>> + It is very fast.
>> - It has no simple way of achieving item 3.
>
> It could use a separate case-table for item 3, couldn't it?

Not that I can tell. You either need to tell emacs to either (1)
ignore the accute entirely, or (2) have the "a´" pair of characters
fold into "a". Case tables just can't do this right now AFAIK.

> I think we will need separate tables for different foldings anyway,
> because each use case calls for some specific folding.  In isearch,
> the user will have to specify which foldings she wants to be in
> effect.

Yes, multiple tables are fine and will be done regardless of the approach taken.

>> - If the user decides to set `group-fold-search' to t, this can break
>> existing code (a disadvantage that the lisp version above does not
>> have).
>> - It adds two extra fields to every buffer object (the boolean
>> variable and the char table).
>
> I'm not sure we need to add these tables to the buffer object.  The
> experience with using case-tables this way is not encouraging, because
> in several important cases it is not at all clear which buffer is
> relevant to the folding-match operation one needs to do.

Yes, I don't like this either. I was threading unknown waters here, so
I just tried to stays as close as possible to what case-fold-search
does.

>> Do any of these options seem good enough? Which would you all like to explore?
>> I like the second one best, but goal 3 is quite important.
>
> I think we must lift the limitation of single-character folding
> result, which means changes on the C level are inevitable.

I agree this is important. But if no one takes it up I'd rather have
single-character folding than none at all.

> I also think we need to talk a bit more about which kinds of folding
> we would like to support.

What do you mean? Which folding subsets to provide by default?

next prev parent reply	other threads:[~2015-02-06 16:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18   ` Artur Malabarba [this message]
2015-02-06 16:44     ` Eli Zaretskii
2015-02-06 18:03   ` Stefan Monnier
2015-02-06 19:03     ` Eli Zaretskii
2015-02-06 19:27       ` Artur Malabarba
2015-02-06 21:38         ` Eli Zaretskii
2015-02-06 22:08           ` Artur Malabarba
2015-02-07  8:38             ` Eli Zaretskii
2015-02-06 19:41       ` Stefan Monnier
2015-02-06 21:43         ` Eli Zaretskii
2015-02-07  0:05           ` Stefan Monnier
2015-02-07  8:47             ` Eli Zaretskii
2015-02-07 15:02               ` Stefan Monnier
2015-02-07 15:31                 ` Eli Zaretskii
2015-02-08 14:03                   ` Stefan Monnier
2015-02-08 19:12                     ` Eli Zaretskii
2015-02-09  3:03                       ` Stefan Monnier
2015-02-09 15:40                         ` Eli Zaretskii
2015-02-09 16:33                           ` Stefan Monnier
2015-02-09 17:39                             ` Eli Zaretskii
2015-02-10  2:15                               ` Stefan Monnier
2015-02-10 15:45                                 ` Eli Zaretskii
2015-02-07  0:07 ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAdUY-L0E5XQwSAwRrvU-4VtmYCwNKyHB8y5EaoUXtOctE4HBw@mail.gmail.com \
    --to=bruce.connor.am@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.