From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Character group folding in searches
Date: Mon, 09 Feb 2015 17:40:44 +0200
Message-ID: <83k2zr9jpf.fsf@gnu.org>
References: <CAAdUY-L8ipk4Aj83hJErinrgODjJab+mhx==59=FjnfmFm_wjw@mail.gmail.com>
	<83zj8rcdpi.fsf@gnu.org> <jwv61begbs3.fsf-monnier+emacs@gnu.org>
	<83k2zudfqk.fsf@gnu.org> <jwvvbjede3l.fsf-monnier+emacs@gnu.org>
	<83bnl6d8d6.fsf@gnu.org> <jwvlhkay4hq.fsf-monnier+emacs@gnu.org>
	<831tm2cdm3.fsf@gnu.org> <jwvsiehwzbh.fsf-monnier+emacs@gnu.org>
	<83bnl5buvm.fsf@gnu.org> <jwvzj8ov7a5.fsf-monnier+emacs@gnu.org>
	<83sieg9pzy.fsf@gnu.org> <jwvzj8nyel3.fsf-monnier+emacs@gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-Trace: ger.gmane.org 1423496476 9464 80.91.229.3 (9 Feb 2015 15:41:16 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 9 Feb 2015 15:41:16 +0000 (UTC)
Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 09 16:41:11 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1YKqST-00075X-Vh
	for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 16:41:10 +0100
Original-Received: from localhost ([::1]:33514 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1YKqST-0000vV-5q
	for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 10:41:09 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37755)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1YKqSO-0000v8-W7
	for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:41:06 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1YKqSK-0005VN-0D
	for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:41:04 -0500
Original-Received: from mtaout23.012.net.il ([80.179.55.175]:60646)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1YKqSJ-0005T7-OQ
	for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:40:59 -0500
Original-Received: from conversion-daemon.a-mtaout23.012.net.il by
	a-mtaout23.012.net.il (HyperSendmail v2007.08) id
	<0NJI00C00GI3JP00@a-mtaout23.012.net.il> for
	emacs-devel@gnu.org; Mon, 09 Feb 2015 17:40:58 +0200 (IST)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0NJI00C70GW9FT70@a-mtaout23.012.net.il>;
	Mon, 09 Feb 2015 17:40:58 +0200 (IST)
In-reply-to: <jwvzj8nyel3.fsf-monnier+emacs@gnu.org>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.175
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:182696
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/182696>

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: bruce.connor.am@gmail.com,  emacs-devel@gnu.org
> Date: Sun, 08 Feb 2015 22:03:08 -0500
>=20
> > Char-tables are efficient, and at least for decomposition they se=
em to
> > be the perfect vehicle.  DFAs that come out of arbitrary regexps,
> > OTOH, can sometimes be very inefficient.  That's why I tend to th=
ink
> > about this in terms of char-tables.
>=20
> That's a false dichotomy.

Actually, it's not a dichotomy at all.  I just explained why
char-tables seem to be a good basis on which to build this feature.

> DFA is about *recognizing* multi-char entities.  If the input
> entities you care about are only single-char (as is the case for
> decomposition), then your DFA will degenerate to a single char-tabl=
e
> (as is the case now).

I think we have a miscommunication here.  I was talking about the
tables that are part of a DFA that drive its state machine.  Those
tables might become large and sparse, certainly if the input symbol
can be any Unicode character, most of which only match themselves.

I guess I'm still struggling to understand your idea of using DFAs.
E.g., you talk about each node of a DFA being a char-table, but AFAIK
a DFA node is just a state of the automaton, so how can that be
expressed as a char-table?  And above you are saying that a "DFA will
degenerate to a single char-table", which again is a stumbling block
for me, since a DFA is more than a table.  What am I missing?

> But how do you use current char-tables to handle multi-char input
> entities (i.e. to recognize things like "=3D>")?

I don't understand the question, sorry.  The simple answer is that a
char-table entry can be any Lisp object, including a string, but you
already know that.

If you mean how to compare "=3D>" with "=E2=87=92", then the latter w=
ill be
"folded" to the former using a char-table, and then the results will
be compared, either as strings or character by character.  Is this
what you were asking?

> > Who and how will create such a DFA?
>=20
> They'd be mechanically constructed (by hand-written code), for exam=
ple
> driven by the existing Unicode tables.

What would be the input language for specifying such a DFA?  I mean,
how would we specify which sequence of states are acceptable (yieldin=
g
a match for the search) and which aren't?