all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Eli Zaretskii <eliz@gnu.org>
Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
Subject: Re: Character group folding in searches
Date: Sun, 08 Feb 2015 09:03:23 -0500	[thread overview]
Message-ID: <jwvzj8ov7a5.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <83bnl5buvm.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 07 Feb 2015 17:31:57 +0200")

> I'm sorry, I don't understand how this will solve the use-cases
> brought up in this thread.  Can you explain?

Every equivalence class selected by such a DFA can match any set of
strings that can be described by a regular expression, so it should be
more than sufficiently powerful.

>   . exact match -- only exactly the same codepoints match

The DFA is trivial, matches any (and only) one-char sequences and
returns the char.

>   . base-character match -- this ignores any combining marks,
>     diacriticals, etc.

Admittedly, less trivial since we have to remember the base char after
matching it, while skipping subsequent combining marks and diacriticals.

>   . matching ligatures, such as ffi and ffi

Straightforward.

>   . ignoring punctuation, like string-collate-equalp does,
>     i.e. "foobar" will match "foo.bar"

Easy: the DFA will simply loop back when it sees a ".".

>   . ignoring isolated zero-width or non-combining marks and
>     directional controls

Same.

> I understand very well how these can be handled by several different
> char-tables, but you seem to say that a single char-table can do all
> this, and I don't see how.

Not sure what you mean by "single char-table" or why you think I said
something about single-vs-multiple char-tables.

A first implementation of DFAs could use internally char-tables (where
each node of the DFA is a char-table) but I think it's something
entirely different from what you mean by "different char-tables" or
"single char-table", since you'd choose one DFA (which may have any
number of char-tables inside).

> Now I'm completely confused: char-tables don't need this optimization,
> as you well know: they already are space-efficient for storing
> characters that map to the table's default value.  So I probably
> misunderstand your whole idea, if it does need such an optimization.

A DFA can have hundreds of nodes (hence hundreds of char-tables if we
use char-tables for that), most of which map one or two chars to
a special value while all others are mapped to "the default", so there
can be significant gains from using a more specialized representation.

>> PS: And this same kind of "char-table extended into a DFA" could be
>> useful for syntax-tables in order to provide much more flexible support
>> for multi-character comment markers or "paren-like nested elements".
> If that's your itch to scratch, I'm impatiently waiting for patches ;-)

It's been in the back of my mind for many years.


        Stefan



  reply	other threads:[~2015-02-08 14:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18   ` Artur Malabarba
2015-02-06 16:44     ` Eli Zaretskii
2015-02-06 18:03   ` Stefan Monnier
2015-02-06 19:03     ` Eli Zaretskii
2015-02-06 19:27       ` Artur Malabarba
2015-02-06 21:38         ` Eli Zaretskii
2015-02-06 22:08           ` Artur Malabarba
2015-02-07  8:38             ` Eli Zaretskii
2015-02-06 19:41       ` Stefan Monnier
2015-02-06 21:43         ` Eli Zaretskii
2015-02-07  0:05           ` Stefan Monnier
2015-02-07  8:47             ` Eli Zaretskii
2015-02-07 15:02               ` Stefan Monnier
2015-02-07 15:31                 ` Eli Zaretskii
2015-02-08 14:03                   ` Stefan Monnier [this message]
2015-02-08 19:12                     ` Eli Zaretskii
2015-02-09  3:03                       ` Stefan Monnier
2015-02-09 15:40                         ` Eli Zaretskii
2015-02-09 16:33                           ` Stefan Monnier
2015-02-09 17:39                             ` Eli Zaretskii
2015-02-10  2:15                               ` Stefan Monnier
2015-02-10 15:45                                 ` Eli Zaretskii
2015-02-07  0:07 ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvzj8ov7a5.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=bruce.connor.am@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.