From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
Subject: Re: Character group folding in searches
Date: Sat, 07 Feb 2015 17:31:57 +0200 [thread overview]
Message-ID: <83bnl5buvm.fsf@gnu.org> (raw)
In-Reply-To: <jwvsiehwzbh.fsf-monnier+emacs@gnu.org>
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
> Date: Sat, 07 Feb 2015 10:02:52 -0500
>
> To me the simplest option is to have a DFA which returns an integer
> (this integer being "the equivalence class number", and which will
> usually be one of the characters in the equivalence class).
>
> Each DFA node could be a char-table. So if all equivalence classes are
> made up of single-chars, the DFA collapses is just a plain-old
> char-table mapping chars to the canonical element of their
> equivalence classes. For 2-char elements, we'll arrange for the
> entry for the first char (in the main char-table) to be not an integer
> but another char-table. Being a DFA, this could easily handle complex
> elements (matching arbitrary regular expressions), tho whether we'd make
> much use of this particular feature is not very important.
I'm sorry, I don't understand how this will solve the use-cases
brought up in this thread. Can you explain?
The use-cases I have in mind are:
. exact match -- only exactly the same codepoints match
. base-character match -- this ignores any combining marks,
diacriticals, etc.
. matching ligatures, such as ffi and ffi
. ignoring punctuation, like string-collate-equalp does,
i.e. "foobar" will match "foo.bar"
. ignoring isolated zero-width or non-combining marks and
directional controls
I understand very well how these can be handled by several different
char-tables, but you seem to say that a single char-table can do all
this, and I don't see how.
Also, what does DFA have to do with all this?
> Since some of the nodes in the DFA would likely only handle a very few
> chars specially, we could later improve the representation so that those
> nodes don't use up a whole char-table.
Now I'm completely confused: char-tables don't need this optimization,
as you well know: they already are space-efficient for storing
characters that map to the table's default value. So I probably
misunderstand your whole idea, if it does need such an optimization.
> PS: And this same kind of "char-table extended into a DFA" could be
> useful for syntax-tables in order to provide much more flexible support
> for multi-character comment markers or "paren-like nested elements".
If that's your itch to scratch, I'm impatiently waiting for patches ;-)
next prev parent reply other threads:[~2015-02-07 15:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18 ` Artur Malabarba
2015-02-06 16:44 ` Eli Zaretskii
2015-02-06 18:03 ` Stefan Monnier
2015-02-06 19:03 ` Eli Zaretskii
2015-02-06 19:27 ` Artur Malabarba
2015-02-06 21:38 ` Eli Zaretskii
2015-02-06 22:08 ` Artur Malabarba
2015-02-07 8:38 ` Eli Zaretskii
2015-02-06 19:41 ` Stefan Monnier
2015-02-06 21:43 ` Eli Zaretskii
2015-02-07 0:05 ` Stefan Monnier
2015-02-07 8:47 ` Eli Zaretskii
2015-02-07 15:02 ` Stefan Monnier
2015-02-07 15:31 ` Eli Zaretskii [this message]
2015-02-08 14:03 ` Stefan Monnier
2015-02-08 19:12 ` Eli Zaretskii
2015-02-09 3:03 ` Stefan Monnier
2015-02-09 15:40 ` Eli Zaretskii
2015-02-09 16:33 ` Stefan Monnier
2015-02-09 17:39 ` Eli Zaretskii
2015-02-10 2:15 ` Stefan Monnier
2015-02-10 15:45 ` Eli Zaretskii
2015-02-07 0:07 ` Juri Linkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83bnl5buvm.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=bruce.connor.am@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=monnier@IRO.UMontreal.CA \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.