From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Jonathan Walther Newsgroups: gmane.emacs.devel Subject: Re: correct way to do state machines in elisp? Date: Sat, 24 Aug 2002 12:34:40 -0700 Sender: emacs-devel-admin@gnu.org Message-ID: <20020824193440.GA27712@reactor-core.org> References: <20020824043521.GD20524@reactor-core.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="BOKacYhQ+x31HxR3" X-Trace: main.gmane.org 1030218134 1894 127.0.0.1 (24 Aug 2002 19:42:14 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 24 Aug 2002 19:42:14 +0000 (UTC) Cc: emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17ignV-0000UR-00 for ; Sat, 24 Aug 2002 21:42:13 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17ihHU-00038e-00 for ; Sat, 24 Aug 2002 22:13:12 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17igok-0002Rf-00; Sat, 24 Aug 2002 15:43:30 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17igmS-0002LK-00 for emacs-devel@gnu.org; Sat, 24 Aug 2002 15:41:08 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17iglx-0002Jz-00 for emacs-devel@gnu.org; Sat, 24 Aug 2002 15:41:08 -0400 Original-Received: from 00-50-04-4b-bf-28.bconnected.net ([209.53.16.55]) by monty-python.gnu.org with smtp (Exim 4.10) id 17iglw-0002Ju-00 for emacs-devel@gnu.org; Sat, 24 Aug 2002 15:40:37 -0400 Original-Received: (qmail 27792 invoked by uid 1038); 24 Aug 2002 19:34:40 -0000 Original-To: Oliver Scholz Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:6851 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:6851 --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Thank you! The enclosed explanation is very helpful. Jonathan On Sat, Aug 24, 2002 at 09:29:12PM +0200, Oliver Scholz wrote: >It seems that seems that my follow-up did not get through to the >mailing-list. So I send it to you privately. > >Jonathan Walther writes: >[...] >> As I was writing it, I couldn't help thinking, there must be a way to >> write the same thing in LISP, with 1/3 the lines of code. The largest >> guzzler of lines of code I think is the state machine that parses every >> RFC compliant IRC message into a suitable structure. >> >> Can anyone give me hints about how the same thing could be done better >> in elisp? I'm reluctant to use regular expressions for this. >[...] > >I once asked for advice on state machines in comp.lang.lisp and got a >very helpful answer by Paul Foley. Now I asked for and got his >permission to forward his mail. The first part explains the basics of >state machines for a clueless non-programmer without CS-background >like me. But later on there are some examples in Lisp (CL, actually). > >I hope this helps. > >To: Oliver Scholz >Subject: Re: [newbie] state machines >Content-Type: text/plain; charset=3DUS-ASCII >From: Paul Foley >Date: 12 Jun 2002 03:23:28 +1200 > >On Tue, 11 Jun 2002 14:16:23 +0200, Oliver Scholz wrote: > >> I found state machines sometimes mentioned in usenet articles or on >> the web and I understand that there are different kinds (?) of them: >> finite and infinite, deterministic and non-deterministic. I don't >> really understand this issue, but I want to learn more about this. > >A state machine is a "machine" (usually simulated by a computer >program) that has "states". It takes some input, figures out what >state it should go into next, based on the input(s), and goes into >that state, then repeats. > >A deterministic state machine is one in which the "next state" >function is deterministic -- i.e., for any given input and current >state, there's only one possible next state. > >A non-deterministic state machine is one in which the "next state" >function is non-deterministic -- i.e., there's more than one choice >for next state, for at least one combination of input and current >state. > >A finite state machine is one that has a finite number of states. >[Which is all of them -- there are no infinite state machines] > >A very simple example is a traffic light: it has 3 states: green, >orange, and red, and a single-bit "change colour now" input. When >it's in the "green" state, the input changes it to the "orange" state; >when it's in the "orange" state, the input changes it to the "red" >state, and when it's in the "red" state, the input changes it to the >"green" state. > >A computer-related example with which you're probably familiar is the >simple regular expression [real regexps are DFSMs (deterministic >finite state machines); the things called "regexp" in languages like >Perl are not actually regexps -- they're a more powerful class of >languages]. Let's say you have the regexp "a(bx|cy)*d": that means it >matches input consisting of an "a" followed by any number of "bx" or >"cy" pairs, in any order, followed by a "d". The state machine has 5 >states: let's number them starting from 1 -- it starts off in state 1. >In state 1, an input of "a" moves it to state 2; any other input is an >error (i.e., doesn't match the regexp). In state 2, if the input is a >"b" it moves to state 3, and if it's a "c" is moves to state 4, and if >it's a "d" it moves to state 5 (anything else is an error). In state >3, an input of "x" moves it to state 2. In state 4, an input of "y" >moves it to state 2. If you reach state 5, you're done. > >You can draw a picture of it like this: > > x > ,-----------. > | | > | b | > | ,-->[3]--' > a v / d > -->[1]--->[2]---->[[5]] > ^ \ > | `-->[4]--. > | c | > | y | > `-----------' > > [n] represents a state, and the arrows represent transitions; the > letters on the arrows tell you what the input has to be for that > transition to trigger. [[n]] represents a "final" state (there can > possibly be more than one, but for a regexp you can always draw it > with only one final state) > >A NDFSM (non-deterministic ...) has some states that look like > > x > ,--->[B] > / > [A] > \ x > `--->[C] > >with the same letter on two or more transitions -- you can't tell >which one is the 'right one' [if you take the top one, to state B, and >later hit an "error" condition (where there's no suitable transition >and you're not in a final state), you have to back up and try the >lower one, because it might have led you to a final state without >error] > >> My notion is that state machines are useful whenever I have to parse >> input from a stream. I don't know if this statement is correct. My > >They're useful for everything you do on a computer -- a computer is >really just a finite state machine (with a very large number of >potential states). > >> I want to learn a) the (fundamentals of) the concept, b) when to use >> them, c) how to construct them in a proper way and d) how to implement >> them in Lisp. > >There are several ways to implement them. For reasonably small >numbers of states, you can write a TAGBODY with the states as tags, >using GO to change states. Or you can use a variable for the state >number and a CASE statement dispatching on that variable for the code, >or you can write the states and transitions as data, and have a >function that walks over that data -- that's better for a large number >of states... > >For example, the "a(bx|cy)*d" machine could be written > > (defun match-expr-1 (stream &aux input (state 1)) > (loop > (setq input (read-char stream nil nil)) > (case state > (1 (ecase input > (#\a (setq state 2))) > (2 (ecase input > (#\b (setq state 3)) > (#\c (setq state 4)) > (#\d (return t)))) ; i.e., state 5 -- done > (3 (ecase input > (#\x (setq state 2)))) > (4 (ecase input > (#\y (setq state 2)))))))) > >or > > (defstruct state > number > transitions > final-p) > > (defstruct transition > input > next-state) > > (defun run-fsm (stream state) > (loop > (if (state-final-p state) > (return t) > (let ((transition (find (read-char stream nil nil) > (state-transitions state) > :key #'transition-input))) > (if (null transition) > (return nil) > (setq state (transition-next-state transition))))))) > > > (defun add-transition (input from-state to-state) > (push (make-transition :input input :next-state to-state) > (state-transitions from-state))) > > (defun match-expr-2 (stream) > (let ((state1 (make-state :number 1)) > (state2 (make-state :number 2)) > (state3 (make-state :number 3)) > (state4 (make-state :number 4)) > (state5 (make-state :number 5 :final-p t))) > (add-transition #\a state1 state2) > (add-transition #\b state2 state3) > (add-transition #\c state2 state4) > (add-transition #\d state2 state5) > (add-transition #\x state3 state2) > (add-transition #\y state4 state2) > (run-fsm stream state1))) > >or whatever [of course, there's no need to rebuild all the states >every time, in the last example] > >A similar solution -- and the usual way of implementing it in languages >like C -- is to use a table. Assume a, b, c, d, x and y are the only >valid inputs, for simplicity; then a table like > > State "a" "b" "c" "d" "x" "y" > 1 2 NIL NIL NIL NIL NIL > 2 NIL 3 4 T NIL NIL > 3 NIL NIL NIL NIL 2 NIL > 4 NIL NIL NIL NIL NIL 2 > >describes the machine (a number means go to that state, T and NIL mean >return T or NIL, respectively), so you could write > > (defun read-input (stream) > (ecase (read-char stream nil nil) > (#\a 0) (#\b 1) (#\c 2) (#\d 3) (#\x 4) (#\y 5))) > > (defun run-fsm2 (stream table &aux (state 0)) > (loop > (let ((value (aref table state (read-input stream)))) > (if (integerp value) > (setq state value) > (return value))))) > > (defun match-expr-3 (stream) > (run-fsm2 #2A(( 1 nil nil nil nil nil) > (nil 2 3 t nil nil) > (nil nil nil nil 1 nil) > (nil nil nil nil nil 1)))) > >[Note the renumbering of states to count from 0 instead of 1] > >Does that help? > >--=20 >Oh dear god. In case you weren't aware, "ad hominem" is not latin for >"the user of this technique is a fine debater." > -- Thomas F. Burdick >(setq reply-to > (concatenate 'string "Paul Foley " "")) > > > > -- Oliver > >--=20 >Oliver Scholz 7 Fructidor an 210 de la R?volution >Taunusstr. 25 Libert?, Egalit?, Fraternit?! >60329 Frankfurt a. M. http://www.jungdemokratenhessen.de >Tel. (069) 97 40 99 42 http://www.jdjl.org --=20 Geek House Productions, Ltd. Providing Unix & Internet Contracting and Consulting, QA Testing, Technical Documentation, Systems Design & Implementation, General Programming, E-commerce, Web & Mail Services since 1998 Phone: 604-435-1205 Email: djw@reactor-core.org Webpage: http://reactor-core.org Address: 2459 E 41st Ave, Vancouver, BC V5R2W2 --BOKacYhQ+x31HxR3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iQCVAwUBPWff0MK9HT/YfGeBAQFVtQQAkiCUWmStBujOBUcvMOgEXfhXdtNfw4lN 5wzelLwMkrOSnQV/e1cjuxX4w0ZMh7xx99fP6J9Ysug/EsvohIJ3VLO4H198skiA 6OtiTCVLiT+5q1IYW9VNwTQSX64KYhRfCZgxj6/cbESfoV9hihEJthj6qL0JWzFn DkxyMLUvUmM= =cSKb -----END PGP SIGNATURE----- --BOKacYhQ+x31HxR3--