all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Artur Malabarba <bruce.connor.am@gmail.com>
To: "Eli Zaretskii" <eliz@gnu.org>,
	"Clément Pit--Claudel" <clement.pit@gmail.com>,
	emacs-devel <emacs-devel@gnu.org>
Subject: Re: Char-folding: how can we implement matching multiple characters as a single "thing"?
Date: Tue, 1 Dec 2015 14:18:30 +0000	[thread overview]
Message-ID: <CAAdUY-JH9=+mFMbqHotjPNKd3RmBfJF1MVqStvYu=18gVhjqvA@mail.gmail.com> (raw)
In-Reply-To: <m24mg3upk2.fsf@newartisans.com>

All right. For now, I've gone with Paul's suggestion and just made the
algorithm dumber. It won't catch every single scenario, but that's
better than catching none.

I too agree that the ideal approach would be to implement this
entirely in C, but so far we lack the necessary human effort.

There's also a 3rd option. I posted some code here a while ago that
implemented char-folding by temporarily replacing the
(current-case-table) with a char-fold-table. This was fast, and much
nicer than the current regexps, but it had the limitation of only
being a character-to-character relation. So it couldn't do something
as basic as 'a' matching "ä" (because that's 1 char matching 2).

However, it's possible that we could combine the two solutions, using
this case-table for as much as possible and then using regexps for
anything else. This way the regexp pattern that replaces each input
character would likely be considerably smaller than 45 chars (I'd
guess between 3 and 15 depending on the character).
The number of branches would still scale badly with the input string
size. but the smaller multiplicative factor should give us more leeway
before scaling up to 10k chars.



2015-11-30 21:48 GMT+00:00 John Wiegley <jwiegley@gmail.com>:
>>>>>> Eli Zaretskii <eliz@gnu.org> writes:
>
>> Volunteers are welcome to work on the ultimate solution, which should indeed
>> include normalization of both the search string and the buffer/string text
>> that is searched.
>
> I imagine this would be done iteratively, with caching of what had been
> normalized if we happen to back-track within a certain bound.
>
> Any takers for working on the "ultimate solution"?
>
> John
>



  reply	other threads:[~2015-12-01 14:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-30 15:54 Char-folding: how can we implement matching multiple characters as a single "thing"? Artur Malabarba
2015-11-30 16:12 ` Paul Eggert
2015-11-30 16:49 ` Clément Pit--Claudel
2015-11-30 17:55   ` Eli Zaretskii
2015-11-30 21:48     ` John Wiegley
2015-12-01 14:18       ` Artur Malabarba [this message]
2015-12-01 15:50         ` Eli Zaretskii
2015-12-01 16:31 ` GIT mirror of Lisp dev sources [was: Char-folding: how can we implement matching...] Drew Adams
2015-12-01 16:43   ` Steinar Bang
2015-12-01 17:14     ` Drew Adams
2015-12-01 17:32   ` Artur Malabarba
2015-12-01 18:03     ` Drew Adams
2015-12-01 18:29       ` Karl Fogel
2015-12-01 18:52         ` Artur Malabarba
2015-12-01 21:18           ` Drew Adams
2015-12-01 23:37             ` Artur Malabarba
2015-12-02  0:14               ` Drew Adams
2015-12-02  0:59                 ` Artur Malabarba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAdUY-JH9=+mFMbqHotjPNKd3RmBfJF1MVqStvYu=18gVhjqvA@mail.gmail.com' \
    --to=bruce.connor.am@gmail.com \
    --cc=clement.pit@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.