all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: bruce.connor.am@gmail.com
Cc: clement.pit@gmail.com, emacs-devel@gnu.org
Subject: Re: Char-folding: how can we implement matching multiple characters as a single "thing"?
Date: Tue, 01 Dec 2015 17:50:12 +0200	[thread overview]
Message-ID: <837fkykw23.fsf@gnu.org> (raw)
In-Reply-To: <CAAdUY-JH9=+mFMbqHotjPNKd3RmBfJF1MVqStvYu=18gVhjqvA@mail.gmail.com>

> Date: Tue, 1 Dec 2015 14:18:30 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> 
> There's also a 3rd option. I posted some code here a while ago that
> implemented char-folding by temporarily replacing the
> (current-case-table) with a char-fold-table. This was fast, and much
> nicer than the current regexps, but it had the limitation of only
> being a character-to-character relation. So it couldn't do something
> as basic as 'a' matching "ä" (because that's 1 char matching 2).
> 
> However, it's possible that we could combine the two solutions, using
> this case-table for as much as possible and then using regexps for
> anything else. This way the regexp pattern that replaces each input
> character would likely be considerably smaller than 45 chars (I'd
> guess between 3 and 15 depending on the character).
> The number of branches would still scale badly with the input string
> size. but the smaller multiplicative factor should give us more leeway
> before scaling up to 10k chars.

My gut feeling is that if we go to the C level, we should implement
this properly.  Coding another partial solution will almost certainly
bump into some subtle limitations.  In particular, any solution that
requires a literal search to use regexps under the hood will present
restrictions, because it will not play well with other regexp-based
features, like word search and C-M-s itself.




  reply	other threads:[~2015-12-01 15:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-30 15:54 Char-folding: how can we implement matching multiple characters as a single "thing"? Artur Malabarba
2015-11-30 16:12 ` Paul Eggert
2015-11-30 16:49 ` Clément Pit--Claudel
2015-11-30 17:55   ` Eli Zaretskii
2015-11-30 21:48     ` John Wiegley
2015-12-01 14:18       ` Artur Malabarba
2015-12-01 15:50         ` Eli Zaretskii [this message]
2015-12-01 16:31 ` GIT mirror of Lisp dev sources [was: Char-folding: how can we implement matching...] Drew Adams
2015-12-01 16:43   ` Steinar Bang
2015-12-01 17:14     ` Drew Adams
2015-12-01 17:32   ` Artur Malabarba
2015-12-01 18:03     ` Drew Adams
2015-12-01 18:29       ` Karl Fogel
2015-12-01 18:52         ` Artur Malabarba
2015-12-01 21:18           ` Drew Adams
2015-12-01 23:37             ` Artur Malabarba
2015-12-02  0:14               ` Drew Adams
2015-12-02  0:59                 ` Artur Malabarba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=837fkykw23.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=bruce.connor.am@gmail.com \
    --cc=clement.pit@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.