Re: correct way to do state machines in elisp?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Jonathan Walther <krooger@debian.org>
Cc: emacs-devel@gnu.org
Subject: Re: correct way to do state machines in elisp?
Date: Sat, 24 Aug 2002 12:34:40 -0700	[thread overview]
Message-ID: <20020824193440.GA27712@reactor-core.org> (raw)
In-Reply-To: <m3elcoukc7.fsf@ID-87814.user.dfncis.de>

[-- Attachment #1: Type: text/plain, Size: 10050 bytes --]

Thank you!  The enclosed explanation is very helpful.

Jonathan

On Sat, Aug 24, 2002 at 09:29:12PM +0200, Oliver Scholz wrote:
>It seems that seems that my follow-up did not get through to the
>mailing-list. So I send it to you privately.
>
>Jonathan Walther <krooger@debian.org> writes:
>[...]
>> As I was writing it, I couldn't help thinking, there must be a way to
>> write the same thing in LISP, with 1/3 the lines of code.  The largest
>> guzzler of lines of code I think is the state machine that parses every
>> RFC compliant IRC message into a suitable structure.
>>
>> Can anyone give me hints about how the same thing could be done better
>> in elisp?  I'm reluctant to use regular expressions for this.
>[...]
>
>I once asked for advice on state machines in comp.lang.lisp and got a
>very helpful answer by Paul Foley. Now I asked for and got his
>permission to forward his mail. The first part explains the basics of
>state machines for a clueless non-programmer without CS-background
>like me. But later on there are some examples in Lisp (CL, actually).
>
>I hope this helps.
>

>To: Oliver Scholz <alkibiades@gmx.de>
>Subject: Re: [newbie] state machines
>Content-Type: text/plain; charset=US-ASCII
>From: Paul Foley <mycroft@actrix.gen.nz>
>Date: 12 Jun 2002 03:23:28 +1200
>
>On Tue, 11 Jun 2002 14:16:23 +0200, Oliver Scholz wrote:
>
>> I found state machines sometimes mentioned in usenet articles or on
>> the web and I understand that there are different kinds (?) of them:
>> finite and infinite, deterministic and non-deterministic.  I don't
>> really understand this issue, but I want to learn more about this.
>
>A state machine is a "machine" (usually simulated by a computer
>program) that has "states".  It takes some input, figures out what
>state it should go into next, based on the input(s), and goes into
>that state, then repeats.
>
>A deterministic state machine is one in which the "next state"
>function is deterministic -- i.e., for any given input and current
>state, there's only one possible next state.
>
>A non-deterministic state machine is one in which the "next state"
>function is non-deterministic -- i.e., there's more than one choice
>for next state, for at least one combination of input and current
>state.
>
>A finite state machine is one that has a finite number of states.
>[Which is all of them -- there are no infinite state machines]
>
>A very simple example is a traffic light: it has 3 states: green,
>orange, and red, and a single-bit "change colour now" input.  When
>it's in the "green" state, the input changes it to the "orange" state;
>when it's in the "orange" state, the input changes it to the "red"
>state, and when it's in the "red" state, the input changes it to the
>"green" state.
>
>A computer-related example with which you're probably familiar is the
>simple regular expression [real regexps are DFSMs (deterministic
>finite state machines); the things called "regexp" in languages like
>Perl are not actually regexps -- they're a more powerful class of
>languages].  Let's say you have the regexp "a(bx|cy)*d": that means it
>matches input consisting of an "a" followed by any number of "bx" or
>"cy" pairs, in any order, followed by a "d".  The state machine has 5
>states: let's number them starting from 1 -- it starts off in state 1.
>In state 1, an input of "a" moves it to state 2; any other input is an
>error (i.e., doesn't match the regexp).  In state 2, if the input is a
>"b" it moves to state 3, and if it's a "c" is moves to state 4, and if
>it's a "d" it moves to state 5 (anything else is an error).  In state
>3, an input of "x" moves it to state 2.  In state 4, an input of "y"
>moves it to state 2.  If you reach state 5, you're done.
>
>You can draw a picture of it like this:
>
>                   x
>             ,-----------.
>             |           |
>             |    b      |
>             |  ,-->[3]--'
>         a   v / d
>  -->[1]--->[2]---->[[5]]
>             ^ \
>             |  `-->[4]--.
>             |    c      |
>             |      y    |
>             `-----------'
>
> [n] represents a state, and the arrows represent transitions; the
> letters on the arrows tell you what the input has to be for that
> transition to trigger.  [[n]] represents a "final" state (there can
> possibly be more than one, but for a regexp you can always draw it
> with only one final state)
>
>A NDFSM (non-deterministic ...) has some states that look like
>
>         x
>       ,--->[B]
>      /
>   [A]
>      \  x
>       `--->[C]
>
>with the same letter on two or more transitions -- you can't tell
>which one is the 'right one' [if you take the top one, to state B, and
>later hit an "error" condition (where there's no suitable transition
>and you're not in a final state), you have to back up and try the
>lower one, because it might have led you to a final state without
>error]
>
>> My notion is that state machines are useful whenever I have to parse
>> input from a stream. I don't know if this statement is correct. My
>
>They're useful for everything you do on a computer -- a computer is
>really just a finite state machine (with a very large number of
>potential states).
>
>> I want to learn a) the (fundamentals of) the concept, b) when to use
>> them, c) how to construct them in a proper way and d) how to implement
>> them in Lisp.
>
>There are several ways to implement them.  For reasonably small
>numbers of states, you can write a TAGBODY with the states as tags,
>using GO to change states.  Or you can use a variable for the state
>number and a CASE statement dispatching on that variable for the code,
>or you can write the states and transitions as data, and have a
>function that walks over that data -- that's better for a large number
>of states...
>
>For example, the "a(bx|cy)*d" machine could be written
>
>  (defun match-expr-1 (stream &aux input (state 1))
>    (loop
>      (setq input (read-char stream nil nil))
>      (case state
>        (1 (ecase input
>             (#\a (setq state 2)))
>        (2 (ecase input
>             (#\b (setq state 3))
>             (#\c (setq state 4))
>             (#\d (return t))))   ; i.e., state 5 -- done
>        (3 (ecase input
>             (#\x (setq state 2))))
>        (4 (ecase input
>             (#\y (setq state 2))))))))
>
>or
>
>  (defstruct state
>    number
>    transitions
>    final-p)
>
>  (defstruct transition
>    input
>    next-state)
>
>  (defun run-fsm (stream state)
>    (loop
>      (if (state-final-p state)
>          (return t)
>          (let ((transition (find (read-char stream nil nil)
>                                  (state-transitions state)
>                                  :key #'transition-input)))
>            (if (null transition)
>                (return nil)
>                (setq state (transition-next-state transition)))))))
>
>
>  (defun add-transition (input from-state to-state)
>    (push (make-transition :input input :next-state to-state)
>          (state-transitions from-state)))
>
>  (defun match-expr-2 (stream)
>    (let ((state1 (make-state :number 1))
>          (state2 (make-state :number 2))
>          (state3 (make-state :number 3))
>          (state4 (make-state :number 4))
>          (state5 (make-state :number 5 :final-p t)))
>      (add-transition #\a state1 state2)
>      (add-transition #\b state2 state3)
>      (add-transition #\c state2 state4)
>      (add-transition #\d state2 state5)
>      (add-transition #\x state3 state2)
>      (add-transition #\y state4 state2)
>      (run-fsm stream state1)))
>
>or whatever [of course, there's no need to rebuild all the states
>every time, in the last example]
>
>A similar solution -- and the usual way of implementing it in languages
>like C -- is to use a table.  Assume a, b, c, d, x and y are the only
>valid inputs, for simplicity; then a table like
>
> State  "a"   "b"   "c"  "d"  "x"  "y"
>   1     2    NIL   NIL  NIL  NIL  NIL
>   2    NIL    3     4    T   NIL  NIL
>   3    NIL   NIL   NIL  NIL   2   NIL
>   4    NIL   NIL   NIL  NIL  NIL   2
>
>describes the machine (a number means go to that state, T and NIL mean
>return T or NIL, respectively), so you could write
>
>  (defun read-input (stream)
>    (ecase (read-char stream nil nil)
>      (#\a 0) (#\b 1) (#\c 2) (#\d 3) (#\x 4) (#\y 5)))
>
>  (defun run-fsm2 (stream table &aux (state 0))
>    (loop
>      (let ((value (aref table state (read-input stream))))
>        (if (integerp value)
>            (setq state value)
>            (return value)))))
>
>  (defun match-expr-3 (stream)
>    (run-fsm2 #2A(( 1  nil nil nil nil nil)
>                  (nil  2   3   t  nil nil)
>                  (nil nil nil nil  1  nil)
>                  (nil nil nil nil nil  1))))
>
>[Note the renumbering of states to count from 0 instead of 1]
>
>Does that help?
>
>-- 
>Oh dear god.  In case you weren't aware, "ad hominem" is not latin for
>"the user of this technique is a fine debater."
>                                                      -- Thomas F. Burdick
>(setq reply-to
>  (concatenate 'string "Paul Foley " "<mycroft" '(#\@) "actrix.gen.nz>"))
>
>

>
>    -- Oliver
>
>-- 
>Oliver Scholz               7 Fructidor an 210 de la R?volution
>Taunusstr. 25               Libert?, Egalit?, Fraternit?!
>60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
>Tel. (069) 97 40 99 42      http://www.jdjl.org


-- 
                     Geek House Productions, Ltd.

  Providing Unix & Internet Contracting and Consulting,
  QA Testing, Technical Documentation, Systems Design & Implementation,
  General Programming, E-commerce, Web & Mail Services since 1998

Phone:   604-435-1205
Email:   djw@reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC  V5R2W2

[-- Attachment #2: Type: application/pgp-signature, Size: 307 bytes --]

next prev parent reply	other threads:[~2002-08-24 19:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-24  4:35 correct way to do state machines in elisp? Jonathan Walther
2002-08-24 16:52 ` Alex Schroeder
     [not found] ` <m3elcoukc7.fsf@ID-87814.user.dfncis.de>
2002-08-24 19:34   ` Jonathan Walther [this message]
2002-08-25  5:27 ` Richard Stallman
2002-08-25  7:32   ` linuxos and GNU (Was: Re: correct way to do state machines in elisp?) Jonathan Walther

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020824193440.GA27712@reactor-core.org \
    --to=krooger@debian.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).