From: Artur Malabarba <bruce.connor.am@gmail.com>
To: "Eli Zaretskii" <eliz@gnu.org>,
"Clément Pit--Claudel" <clement.pit@gmail.com>,
emacs-devel <emacs-devel@gnu.org>
Subject: Re: Char-folding: how can we implement matching multiple characters as a single "thing"?
Date: Tue, 1 Dec 2015 14:18:30 +0000 [thread overview]
Message-ID: <CAAdUY-JH9=+mFMbqHotjPNKd3RmBfJF1MVqStvYu=18gVhjqvA@mail.gmail.com> (raw)
In-Reply-To: <m24mg3upk2.fsf@newartisans.com>
All right. For now, I've gone with Paul's suggestion and just made the
algorithm dumber. It won't catch every single scenario, but that's
better than catching none.
I too agree that the ideal approach would be to implement this
entirely in C, but so far we lack the necessary human effort.
There's also a 3rd option. I posted some code here a while ago that
implemented char-folding by temporarily replacing the
(current-case-table) with a char-fold-table. This was fast, and much
nicer than the current regexps, but it had the limitation of only
being a character-to-character relation. So it couldn't do something
as basic as 'a' matching "ä" (because that's 1 char matching 2).
However, it's possible that we could combine the two solutions, using
this case-table for as much as possible and then using regexps for
anything else. This way the regexp pattern that replaces each input
character would likely be considerably smaller than 45 chars (I'd
guess between 3 and 15 depending on the character).
The number of branches would still scale badly with the input string
size. but the smaller multiplicative factor should give us more leeway
before scaling up to 10k chars.
2015-11-30 21:48 GMT+00:00 John Wiegley <jwiegley@gmail.com>:
>>>>>> Eli Zaretskii <eliz@gnu.org> writes:
>
>> Volunteers are welcome to work on the ultimate solution, which should indeed
>> include normalization of both the search string and the buffer/string text
>> that is searched.
>
> I imagine this would be done iteratively, with caching of what had been
> normalized if we happen to back-track within a certain bound.
>
> Any takers for working on the "ultimate solution"?
>
> John
>
next prev parent reply other threads:[~2015-12-01 14:18 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-30 15:54 Char-folding: how can we implement matching multiple characters as a single "thing"? Artur Malabarba
2015-11-30 16:12 ` Paul Eggert
2015-11-30 16:49 ` Clément Pit--Claudel
2015-11-30 17:55 ` Eli Zaretskii
2015-11-30 21:48 ` John Wiegley
2015-12-01 14:18 ` Artur Malabarba [this message]
2015-12-01 15:50 ` Eli Zaretskii
2015-12-01 16:31 ` GIT mirror of Lisp dev sources [was: Char-folding: how can we implement matching...] Drew Adams
2015-12-01 16:43 ` Steinar Bang
2015-12-01 17:14 ` Drew Adams
2015-12-01 17:32 ` Artur Malabarba
2015-12-01 18:03 ` Drew Adams
2015-12-01 18:29 ` Karl Fogel
2015-12-01 18:52 ` Artur Malabarba
2015-12-01 21:18 ` Drew Adams
2015-12-01 23:37 ` Artur Malabarba
2015-12-02 0:14 ` Drew Adams
2015-12-02 0:59 ` Artur Malabarba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAdUY-JH9=+mFMbqHotjPNKd3RmBfJF1MVqStvYu=18gVhjqvA@mail.gmail.com' \
--to=bruce.connor.am@gmail.com \
--cc=clement.pit@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.