From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Character group folding in searches Date: Mon, 09 Feb 2015 19:39:24 +0200 Message-ID: <834mqv9e7n.fsf@gnu.org> References: <83zj8rcdpi.fsf@gnu.org> <83k2zudfqk.fsf@gnu.org> <83bnl6d8d6.fsf@gnu.org> <831tm2cdm3.fsf@gnu.org> <83bnl5buvm.fsf@gnu.org> <83sieg9pzy.fsf@gnu.org> <83k2zr9jpf.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1423503634 9161 80.91.229.3 (9 Feb 2015 17:40:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 9 Feb 2015 17:40:34 +0000 (UTC) Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 09 18:40:34 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YKsJz-0003xk-6w for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 18:40:31 +0100 Original-Received: from localhost ([::1]:34245 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKsJy-0006Ql-PF for ged-emacs-devel@m.gmane.org; Mon, 09 Feb 2015 12:40:30 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41258) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKsJw-0006P5-1j for emacs-devel@gnu.org; Mon, 09 Feb 2015 12:40:29 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKsJs-0007KW-P7 for emacs-devel@gnu.org; Mon, 09 Feb 2015 12:40:28 -0500 Original-Received: from mtaout28.012.net.il ([80.179.55.184]:44319) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKsJs-0007JY-Ay for emacs-devel@gnu.org; Mon, 09 Feb 2015 12:40:24 -0500 Original-Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NJI00C00LYL4G00@mtaout28.012.net.il> for emacs-devel@gnu.org; Mon, 09 Feb 2015 19:37:54 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NJI008M8MB5BH40@mtaout28.012.net.il>; Mon, 09 Feb 2015 19:37:54 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.184 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:182713 Archived-At: > From: Stefan Monnier > Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org > Date: Mon, 09 Feb 2015 11:33:29 -0500 >=20 > > I guess I'm still struggling to understand your idea of using DFA= s. > > E.g., you talk about each node of a DFA being a char-table, but A= FAIK > > a DFA node is just a state of the automaton, so how can that be >=20 > A DFA node is a state with labeled arcs going out to other states. >=20 > It's usually implemented as a "table" (array, hash-table, char-tabl= e, ...) > that maps the labels to the next state. >=20 > Does it make more sense, now? Yes. (Your "node" seems to be both a node and some part of the transition function, which is not the usual terminology, AFAIR.) > >> But how do you use current char-tables to handle multi-char inpu= t > >> entities (i.e. to recognize things like "=3D>")? > > I don't understand the question, sorry. The simple answer is tha= t a > > char-table entry can be any Lisp object, including a string, but = you > > already know that. >=20 > That doesn't tell me how you'd use it. Would the ?=3D char be mapp= ed to > a list of strings (one of them being "=3D>") and then you'd check i= f the > next (few) chars match one of those strings? > What I suggest is to map the ?=3D char to another char-table which = then > maps the ?> char to (say) ?=E2=87=92. I thought doing it the other way around, starting with ?=E2=87=92. > > If you mean how to compare "=3D>" with "=E2=87=92", then the latt= er will be > > "folded" to the former using a char-table, >=20 > [ I always get confused by this terminology since "folding" to me > implies making things smaller, so I'd call it "unfolding" in that > direction. ] >=20 > > and then the results will be compared, either as strings or chara= cter > > by character. Is this what you were asking? >=20 > But how would this handle an equivalence class that includes both "= =3D>" > and "->"? Why should we? "->" could be equivalent to ?=E2=86=92, but I see no = reason to make it equivalent to "=3D>". > >> > Who and how will create such a DFA? > >> They'd be mechanically constructed (by hand-written code), for e= xample > >> driven by the existing Unicode tables. > > What would be the input language for specifying such a DFA? I me= an, > > how would we specify which sequence of states are acceptable (yie= lding > > a match for the search) and which aren't? >=20 > Depends. For the Unicode-defined equivalence classes, we'd use the > Unicode tables directly and build the DFA nodes from it without goi= ng > through some intermediate "specification". >=20 > For other cases, we could specify the DFA with a list of strings. > Or with regular expressions. And the DFA will be in C or in Lisp? Anyway, I hope to see something like that landing on master, preferably sooner than later.