From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character group folding in searches Date: Mon, 09 Feb 2015 17:40:44 +0200 Message-ID: <83k2zr9jpf.fsf@gnu.org> References: <83zj8rcdpi.fsf@gnu.org> <83k2zudfqk.fsf@gnu.org> <83bnl6d8d6.fsf@gnu.org> <831tm2cdm3.fsf@gnu.org> <83bnl5buvm.fsf@gnu.org> <83sieg9pzy.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1423496476 9464 80.91.229.3 (9 Feb 2015 15:41:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 9 Feb 2015 15:41:16 +0000 (UTC) Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 09 16:41:11 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YKqST-00075X-Vh for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 16:41:10 +0100 Original-Received: from localhost ([::1]:33514 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKqST-0000vV-5q for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 10:41:09 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37755) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKqSO-0000v8-W7 for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:41:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKqSK-0005VN-0D for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:41:04 -0500 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:60646) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKqSJ-0005T7-OQ for emacs-devel@gnu.org; Mon, 09 Feb 2015 10:40:59 -0500 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0NJI00C00GI3JP00@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Mon, 09 Feb 2015 17:40:58 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NJI00C70GW9FT70@a-mtaout23.012.net.il>; Mon, 09 Feb 2015 17:40:58 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.175 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182696 Archived-At: > From: Stefan Monnier > Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org > Date: Sun, 08 Feb 2015 22:03:08 -0500 >=20 > > Char-tables are efficient, and at least for decomposition they se= em to > > be the perfect vehicle. DFAs that come out of arbitrary regexps, > > OTOH, can sometimes be very inefficient. That's why I tend to th= ink > > about this in terms of char-tables. >=20 > That's a false dichotomy. Actually, it's not a dichotomy at all. I just explained why char-tables seem to be a good basis on which to build this feature. > DFA is about *recognizing* multi-char entities. If the input > entities you care about are only single-char (as is the case for > decomposition), then your DFA will degenerate to a single char-tabl= e > (as is the case now). I think we have a miscommunication here. I was talking about the tables that are part of a DFA that drive its state machine. Those tables might become large and sparse, certainly if the input symbol can be any Unicode character, most of which only match themselves. I guess I'm still struggling to understand your idea of using DFAs. E.g., you talk about each node of a DFA being a char-table, but AFAIK a DFA node is just a state of the automaton, so how can that be expressed as a char-table? And above you are saying that a "DFA will degenerate to a single char-table", which again is a stumbling block for me, since a DFA is more than a table. What am I missing? > But how do you use current char-tables to handle multi-char input > entities (i.e. to recognize things like "=3D>")? I don't understand the question, sorry. The simple answer is that a char-table entry can be any Lisp object, including a string, but you already know that. If you mean how to compare "=3D>" with "=E2=87=92", then the latter w= ill be "folded" to the former using a char-table, and then the results will be compared, either as strings or character by character. Is this what you were asking? > > Who and how will create such a DFA? >=20 > They'd be mechanically constructed (by hand-written code), for exam= ple > driven by the existing Unicode tables. What would be the input language for specifying such a DFA? I mean, how would we specify which sequence of states are acceptable (yieldin= g a match for the search) and which aren't?