From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.devel Subject: Re: XPM via Lisp in the toolbar Date: Wed, 13 Nov 2002 18:15:36 +0100 Sender: emacs-devel-admin@gnu.org Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: main.gmane.org 1037209154 32057 80.91.224.249 (13 Nov 2002 17:39:14 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 13 Nov 2002 17:39:14 +0000 (UTC) Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18C1Tq-0008KV-00 for ; Wed, 13 Nov 2002 18:39:10 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 18C1fp-0001nI-00 for ; Wed, 13 Nov 2002 18:51:33 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 18C1Ru-0006u0-00; Wed, 13 Nov 2002 12:37:10 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 18C1D1-0001XM-00 for emacs-devel@gnu.org; Wed, 13 Nov 2002 12:21:47 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 18C1Cs-0001Vp-00 for emacs-devel@gnu.org; Wed, 13 Nov 2002 12:21:46 -0500 Original-Received: from [213.165.64.20] (helo=mail.gmx.net) by monty-python.gnu.org with smtp (Exim 4.10) id 18C1Cl-0001Sx-00 for emacs-devel@gnu.org; Wed, 13 Nov 2002 12:21:32 -0500 Original-Received: (qmail 14124 invoked by uid 0); 13 Nov 2002 17:21:10 -0000 Original-Received: from dialin-145-254-191-210.arcor-ip.net (HELO HERMES) (145.254.191.210) by mail.gmx.net (mp009-rz3) with SMTP; 13 Nov 2002 17:21:10 -0000 X-Mailer: emacs 21.2.1 (via feedmail 11-beta-1 Q) Original-To: emacs-devel@gnu.org In-Reply-To: (Oliver Scholz's message of "Tue, 12 Nov 2002 18:32:25 +0100") X-Operating-System: GNU/Linux, built from scratch: http://www.linuxfromscratch.org/ X-Attribution: os X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-msvc-nt5.1.2600) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:9394 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:9394 --=-=-= Oliver Scholz writes: > Some time ago I posted a library to deal with XPM images (internally: > convert them to PPM) to this group. I am currently working on this > again [...] Maybe I should post my code, too (attached). I am finished with xpm.el so far, except that I yet have to check whether transparency does work on the X Windowing system -- it sure does not work on MS Windows XP. So far my code provides a function `xpm-create-image' that returns an image descriptor. This image descriptor contains a string containing PPM image data as the value of the :data attribute and a value for the :mask attribute that matches pixels for which the XPM image specified transparency. Thus Emacs should be able to display the image correctly if it has proper image support at all. The question remains how to hook this into the rest of Emacs. The simple way would be to do nothing but add this package (provided that you like it, of course). Then packages like gamegrid could use `xpm-create-image' directly. But this would lead to Emacs supporting XPM in some cases and in other cases not. Or I could hook `xpm-create-image' into the appropriate functions in image.el. I have already done the work needed, save some more extensive testing. Emacs would then always support XPM, albeit a little bit slower if it was not compiled with support for XPM on the C level. I think this is nicer and more consistent. But what to do with the tool bar? On MS Windows this breaks the tool bar totally. And whether the additional delay at startup is endurable or not is another question. It is not notable on my new computer, but it might be on older machines (I don't know). The delay at startup would not be an issue anymore, of course, if we convert the XPM icons in lisp/toolbar/ to PPM as Dave Love suggested. Or I could add a caching mechanism to toolbar.el so that the conversion happens only once. This would allow to keep the XPM icons and get rid of the PBM icons instead, because the XPM format allows to tweak the image for different display types. --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=xpm.el Content-Transfer-Encoding: 8bit ;;; xpm.el --- Convert XPM images ;; Copyright (C) 2002 Free Software Foundation, Inc. ;; Author: Oliver Scholz ;; Keywords: multimedia ;; This file is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 2, or (at your option) ;; any later version. ;; This file is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 59 Temple Place - Suite 330, ;; Boston, MA 02111-1307, USA. ;;; Commentary: ;; This package provides functions to deal with XPM-images if Emacs is ;; compiled without XPM-support on the C-level. ;; The main function is `xpm-create-image'. It returns an image ;; descriptor, similar to `create-image'. ;; This aims to implement XPM version 3. There might be some issues ;; with earlier versions of the XPM format. But I could not find a ;; specification for them. ;; Internally this package converts the XPM data to the PPM ;; format. But this is not the whole story. The function ;; `xpm-create-image' returns an image descriptor as functions like ;; `insert-image' use it. This descriptor contains the image data in ;; PPM format as the value for the :data property. Unlike PPM, XPM ;; allows for a transparent "colour". `xpm-create-image' deals with ;; this by choosing a unoccupied colour for the affected pixels and ;; setting the :mask property in the image descriptor accordingly. ;; Some short and vague notes about the image formats involved: ;; XPM is a textual format. It was invented to provide icons and ;; images on widgets for the X Windowing System. The syntax of the ;; format is C-like. (I guess it is possible to cut&paste an XPM file ;; as it is into C-code, but not knowing C I am not sure about ;; this. -- O.S.) In the header section of the XPM colours are ;; assigned to sequences of one or more ascii characters. The actual ;; pixmap is defined by listing those characters row by row. This ;; makes XPM potentially human readable, because if the image is small ;; enough and if the characters are well chosen with regard to shades ;; of gray, the XPM source looks like an ascii art graphic of the ;; specified image. Ignoring syntax, the XPM format consists of three ;; major parts: 1. a declaration of the width, height, number of ;; colours and of the lenght of the sequence of chars used to specify ;; a colour in the pixmap -- all of them in that orders as decimal ;; numbers in a single string. 2. the mapping of characters or ;; character sequences to colours -- in several strings, one for each ;; colour. 3. the pixmap itself -- one string for each row. Besides ;; being textual and possibly human readable, one main advantage of ;; XPM is that the format allows to assign colours differently to ;; characters or character sequences for colour, grayscale and mono ;; displays. Thus one single XPM source may provide a colour, ;; grayscale or black&white image. This package aims to implement ;; this transparently by choosing the right colour assignment for the ;; current display type. ;; PPM is a part of a family of image formats, sometimes altogether ;; called "PNM". Together with its sister formats it was invented as ;; an _intermediate_ image format for image conversion with command ;; line tools like netpbm or pbmplus. The main advantage of each of ;; these formats is simplicity. Each PNM format comes in two versions: ;; a binary one and a textual one. For completeness here is a table of ;; the PNM formats: ;; Format Type Magic Number (Textual) Magic Number (Binary) ;; ;; PBM black&white P1 P4 ;; PGN grayscale P2 P5 ;; PPM colour P3 P6 ;; This package employs PPM in the binary format (P6). For simplicity ;; it does so even if it converts the XPM for a mono or grayscale ;; display. Each PNM file consists of the following parts, each of ;; them is separated by whitespace: the magic number, the width of the ;; image in pixels, the height of the image in pixels, the maximum ;; value per colour component, and the actual pixmap specification ;; itself (depending on the format). The binary format of PPM ;; specifies the pixmap as a sequence of bytes. Each three bytes ;; specifiy the red, green and blue value of a pixel's colour ;; respectively. ;;; Code: ;;; When parsing the header of the XPM, we put the colours as RGB ;;; values appropriate for the current display type into a hash table. (defun xpm-colour-values (colour) "Return the RGB-value of COLOUR as a list of three integers. The integers range from 0 to 255. COLOUR is a string, either the name of a colour as used by the X windowing system (\"Firebrick\") or a HTML-type colour specification \(\"#002f5d\")." ;; The function `colour-values' returns a list of RGB values up to ;; 65535, resulting in 2.815e+14 possible colours. It is -- I assume ;; -- possible to support that many colours using the ascii format ;; for PPM (magic number "P3") rather than the binary format (magic ;; number "P6"), as we do in this library. The binary format uses ;; one byte per RGB-colour-value, resulting in 16.777.216 possible ;; colours, which should be enough for a fall-back like this ;; package. (let ((col (mapcar (lambda (c) (truncate (sqrt c))) (color-values colour)))) ;; We preserve pure black for transparency. Maybe this is too ugly ;; a kludge? The Right Thing (tm) would be to find an unused ;; colour in the colour hash to denote transparent pixels. But ;; OTOH this approach is much simpler and I doubt that any human ;; being's eyes could see the difference between #000000 and ;; #000001. (if (equal col '(0 0 0)) '(0 0 1) col))) (defun xpm-read-string () "Read the next string in the current buffer. Thereby skipping any leading garbage as well as strings within a comment or escaped quotation marks." (run-state-machine () (start (?\\ nil t skip-next-char-1) (?/ nil t commentary?) (?\" nil t read-string) (t nil t start)) ;; If the last char was a `\', we simply skip the next char in ;; order to deal with escape sequences. Then we proceed looking ;; for a string. (skip-next-char-1 (t nil t start)) ;; If the char before was a `/' we are possibly at the start of a ;; commentary. (commentary? (?* nil t skip-commentary) (t nil t start)) ;; Skip a commentary. (skip-commentary (?\\ nil t skip-next-char-2) (?* nil t end-of-commentary?) (t nil t skip-commentary)) ;; Skip an escaped char. (skip-next-char-2 (t nil t skip-commentary)) ;; If the last char was a `*', this could be the end of the ;; commentary section. (end-of-commentary? (?/ nil t start) (?* nil t end-of-commentary?) (t nil t skip-commentary)) ;; Read a string. (read-string (?\\ nil t skip-next-char-3) (?\" nil t exit) ; EXIT (t t t read-string)) ; add char to the returned string ;; Skip an escaped char. (skip-next-char-3 (t t t read-string)))) (defun xpm-parse-spec-string-1 (string) "Return XPM colour specification as an alist. STRING is a string containing a sequence of display-type symbols and colours. The elements of the returned alist are cons cells whose car is the display type symbol and whose cdr is the colour." (let (symb list aux1 aux2) (with-temp-buffer (insert string) (goto-char (point-min)) (skip-syntax-forward "-") ;; We get the spec as a string in the form ;; SYMBOL COLOUR {SYMBOL COLOUR } ;; Where SYMBOL may be either "s", "c", "g" odr "m". COLOUR ;; may be either a HTML-like hex value for the rgb (for ;; example "#01cf40") or an X colour name. Unfortunately the ;; latter one may contain whitespace, so we have to be ;; careful: When reading COLOUR an s, c, g or m after a ;; whitespace character may be either a SYMBOL or the first ;; char of the second word of the name of the colour (for ;; example in "c dark slategray"). (run-state-machine (output (push (cons symb output) list)) ;; When the state machine terminates, because it encounters ;; the end of the buffer, the current value of `output' is the ;; last COLOUR read. ;; Get the first symbol. (get-symbol ((?c ?g ?m ?s) (lambda (o i)(setq symb i) nil) t get-colour-start) ((?\t ?\ ) nil t get-symbol) ; Skip whitespace. (t (lambda (&rest ignore) (error "Could not parse colour spec: %S" string)))) ;; Start reading COLOUR: skip leading whitespace. (get-colour-start ((?\t ?\ ) nil t get-colour-start) ; Skip whitespace. (t t t get-colour)) ;; Read COLOUR until the next whitespace char. (get-colour ((?\t ?\ ) ;; Maybe the name of the colour is finished. Set ;; `aux' to the whitespace char. (lambda (o i) (setq aux1 i) o) ; Don't change the value of the output variable. t get-colour-whitespace) (t t t get-colour)) (get-colour-whitespace ((?\t ?\ ) nil t get-colour-whitespace) ; Skip. ((?c ?g ?m ?s) (lambda (o i) (setq aux2 i) o) t symb?) ;; False alert: concatenated `output', ;; aux and the input last read and ;; proceed reading the colour. (t (lambda (o i) (concat o aux1 i)) t get-colour)) ;; The last char was either an s, c, g or m. If the current ;; char is a whitespace char, it was a SYMBOL, otherwise it is ;; part of COLOUR. (symb? ((?\t ?\ ) (lambda (o i) ;; Push a cons cell of SYMBOL and COLOUR to `list'. (push (cons symb o) list) ;; Set `symb' to the new SYMBOL. (setq symb aux2) ;; Set `output' to nil. nil) t get-colour-start) ;; False alert: Concatenate the stuff together again ;; and proceed reading COLOUR. (t (lambda (o i) (concat o aux1 aux2 i)) t get-colour)))) list)) (defun xpm-parse-spec-string (string nchars) "Translate an XPM-colour spec to a list. The return value is a list, whose car is a string indicating the sequence of chars used to denote a colour in the XPM and whose cdr is a number of pairs of XPM-keysymbols and colour-names." (let ((chars (substring string 0 nchars))) (cons chars (xpm-parse-spec-string-1 (substring string (1+ nchars)))))) (defun xpm-display-type () "Return a string indicating the current display type." (cond ((display-color-p) "c") ((display-grayscale-p) "g") (t "m"))) (defun xpm-find-colour (spec &optional colour-symbols) "Find the appropriate colour for the current display. SPEC is the XPM specification for a single colour as a list of strings as returned by `xpm-parse-spec-string'. The XPM format allows to assign different colours for different display types to each character-sequence in the XPM pixmap specification by using keysymbols. \"c\" stands for \"coloured display\", \"g\" for \"grayscale\" and \"m\" for \"monochrome display\". The keysymbol \"s\" allows a programm to assign a colour at runtime. The return value is a cons cell of the string indicating the sequence of chars used in the XPM to denote a colour and a list of integers indicating the RGB values of the colour matching the current display type. The optional second argument COLOUR-SYMBOLS, if non-nil, is an alist of symbols/colour-names as required to make use of the special symbol `s' in the colour data of the XPM. If SPEC defines `s' for the colour-char and if the alist COLOUR-SYMBOLS contains a matcher for the colour symbol following `s', the according colour is used, thus overriding the defaults for the current display type (if any)." ;; First we try if there is the key-char "s" defined in the spec and ;; if there is a matching colour in `colour-symbols'. Then we try ;; if there is a colour matching the current display-type (colour, ;; grayscale or b&w). Then we try -- in this order -- "grayscale", ;; "colour" and "mono". (let* ((symb (cdr (assoc "s" spec))) (colour (cdr (or (and symb (assoc symb colour-symbols)) (assoc (xpm-display-type) spec) (assoc "g" spec) (assoc "c" spec) (assoc "m" spec))))) (cons (car spec) (if (string-match "None" colour) ;; "None" stands for "transparent colour". ;; We use '(0 0 0) as "transparent colour". '(0 0 0) (xpm-colour-values colour))))) (defun xpm-make-colour-hash (colours) "Create, initialize and return a hash table containing the colour-specs. The keys are strings indicating the character-sequence used in the XPM image to denote a colour. Each value is a lists of three integers specifying the RGB values. The first argument COLOURS should be an association list of strings/RGB-list used to initialize the hash table. The second argument MASK is a list of RGB-values indicating a colour that is _not_ used in COLOURS." (let ((hash (make-hash-table :test 'equal))) (dolist (elt colours hash) (puthash (car elt) (cdr elt) hash)))) ;; To keep things separated we use a structure as an intermediate data ;; structure between the parsing functions and the functions creating ;; an image. ;; `defstruct' generates a bunch of ugly byte-compiler warnings, so we ;; do this manually. (defun make-xpm-pixmap (&rest keyword-args) "Create and return a `xpm-pixmap'. This is a structure to store image data. Accepted keyword-args to set the slots listed together with their accesor functions: :width -- `xpm-pixmap-width' The width of the image. :height -- `xpm-pixmap-height' The height of the image. :mask -- `xpm-pixmap-mask' The colour used in the image to indicate transparency. This is a list of the three values for red, green and blue. :pixmap -- `xpm-pixmap-pixmap' The actual pixmap-data." (let ((keyword-list '((:width . 1) (:height . 2) (:mask . 3) (:pixmap . 4))) (vect (make-vector 5 nil))) (aset vect 0 'xpm-pixmap) (while keyword-args (aset vect (cdr (assq (pop keyword-args) keyword-list)) (pop keyword-args))) vect)) (defun xpm-barf-if-not-pixmap (object) "Signal an error, if OBJECT is not of the type `xpm-pixmap'." (let ((last-func (backtrace-frame 3))) (unless (and (vectorp object) (eq (aref object 0) 'xpm-pixmap)) (error "Function `%s' trying to access a non-xpm-pixmap object." (nth 1 last-func))))) (defun xpm-pixmap-width (pixmap) "Return the width of xpm-pixmap PIXMAP." (xpm-barf-if-not-pixmap pixmap) (aref pixmap 1)) (defun xpm-pixmap-height (pixmap) "Return the height of xpm-pixmap PIXMAP." (xpm-barf-if-not-pixmap pixmap) (aref pixmap 2)) (defun xpm-pixmap-mask (pixmap) "Return the mask of xpm-pixmap PIXMAP." (xpm-barf-if-not-pixmap pixmap) (aref pixmap 3)) (defun xpm-pixmap-pixmap (pixmap) "Return the width of xpm-pixmap PIXMAP." (xpm-barf-if-not-pixmap pixmap) (aref pixmap 4)) (defun xpm-parse-buffer (buffer &optional colour-symbols) "Parse a buffer containing XPM data and return a pixmap object." (with-current-buffer buffer ;; Get the general information about the XPM. The line we are ;; scanning first has the following format: ;; " WIDTH HEIGHT NUMBER-OF-COLOURS CHARS-PER-COLOUR " (let* ((header (split-string (xpm-read-string))) (width (string-to-number (car header))) (height (string-to-number (nth 1 header))) (ncolours (string-to-number (nth 2 header))) (nchars (string-to-number (nth 3 header))) (clist nil)) ;; Parse the colours. (dotimes (ignore-me ncolours clist) (push (xpm-find-colour (xpm-parse-spec-string (xpm-read-string) nchars) colour-symbols) clist)) ;; Initialize the colour-hash. (let* ((colour-hash (xpm-make-colour-hash clist)) ;; Parse the image. (pdata (xpm-parse-pixmap width height nchars colour-hash))) (make-xpm-pixmap :width width :height height :pixmap pdata))))) ;; This is the function consuming the most time: (defun xpm-parse-pixmap (width height nchars colour-hash) "Read the section of an XPM string or file defining the pixmap. Return an array containing integers from 0 to 255. Each three integers define the colour of a pixel by specifying the value for red, green and blue respectively." (let ((pdata (make-vector (* width height 3) 0)) (count-y 0) (count-x 0) (chars-read 0) (pointer 0) colour cstring) (while (< count-y height) (search-forward "\"") (while (< count-x width) (while (< chars-read nchars) (setq chars-read (1+ chars-read)) (setq cstring (concat cstring (char-to-string (char-after)))) (forward-char)) (setq colour (gethash cstring colour-hash)) (aset pdata pointer (car colour)) (setq pointer (1+ pointer)) (aset pdata pointer (car (cdr colour))) (setq pointer (1+ pointer)) (aset pdata pointer (car (cdr (cdr colour)))) (setq cstring nil pointer (1+ pointer) chars-read 0 count-x (1+ count-x))) (setq count-x 0 count-y (1+ count-y)) (forward-char)) pdata)) (defun xpm-pixmap-to-ppm-string (pic) "Return a data string for a PPM image in the P6 format. PIC must be an xpm-pixmap object as returned by `make-xpm-pixmap'." (concat "P6 " (number-to-string (xpm-pixmap-width pic)) " " (number-to-string (xpm-pixmap-height pic)) " 255 " ;; The pixmap slot contains a vector of integers. We ;; concatenate it to a string containing binary data. (xpm-pixmap-pixmap pic))) (defun xpm-pixmap-to-image (pic &rest props) "Return an actual image from a pixmap object. Optional PROPS are additional image attributes to assign to the image, like, e.g. `:ascent center'." ;; If no :mask property was passed to this function, we add it ;; ourselves. (unless (plist-member props :mask) (push '(heuristic (0 0 0)) props) (push :mask props)) (apply 'create-image (xpm-pixmap-to-ppm-string pic) 'pbm t props)) (defun xpm-strip-props (props) "Strip property list PROPS. If `xpm-create-image' was called by `find-image' or `create-image' the property list may contain specifactions for :file, :data or :type, which have become superfloous." (let ((prop-list (copy-sequence props)) elt list) (while prop-list (setq elt (pop prop-list)) (if (memq elt '(:file :data :type)) ;; Remove the value, too. (pop prop-list) (push elt list))) (nreverse list))) ;;;###autoload (defun xpm-create-image (file-or-data &optional data-p &rest props) "Create an image from XPM data. FILE-OR-DATA is an XPM image file name or a string containing XPM image data. Optional DATA-P non-nil means FILE-OR-DATA is a string containing image data. Optional PROPS are additional image attributes to assign to the image, like, e.g. `:mask MASK'. Since this function deals only with XPM images, the :type attribute is ignored. Value is the image created. Internally this function converts the XPM data to the PPM format. It returns an image descriptor with correct :data and :mask property. The conversion is done with Lisp functions and works even if Emacs was compiled without proper support for XPM images." ;; Interface and docstring of this function are modeled after ;; `create-image'. (with-temp-buffer (if data-p (insert file-or-data) (insert-file-contents-literally file-or-data)) (goto-char (point-min)) (apply 'xpm-pixmap-to-image (xpm-parse-buffer (current-buffer) (plist-get props :color-symbols)) (xpm-strip-props props)))) (provide 'xpm) ;;; xpm.el ends here --=-=-= xpm.el needs another package named state-m.el that provides an easy way to write a state machine for reader functions: --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=state-m.el Content-Transfer-Encoding: 8bit ;;; state-m.el --- Macro for defining a state machine ;; Copyright (C) 2002 Free Software Foundation, Inc. ;; Author: Oliver Scholz ;; Keywords: lisp ;; This file is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 2, or (at your option) ;; any later version. ;; This file is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 59 Temple Place - Suite 330, ;; Boston, MA 02111-1307, USA. ;;; Commentary: ;; This package provides a macro `run-state-machine' which is supposed ;; to make the writing of simple and fast reader functions easy. ;; A state machine is defined as a number of states, each of which ;; defines one or more transitions to another state. The syntax for ;; state definitions is: ;; ( NAME TRANSITION {TRANSITION} ) ;; Name is a symbol that serves as the referer to that state in ;; transitions. The definition of a transition looks like this: ;; ( MATCHER ADD-TO-RESULT ADVANCE NEXT-STATE ) ;; MATCHER may be a character, a string, specifying a regexp, a cons ;; cell of two characters specifying a character range, a list of ;; characters or character ranges specifying alternative matches, or ;; the symbol t indicating a default transition. ;; ADD-TO-RESULT may be nil, t or a function. ;; ADVANCE may be nil, t or a function. ;; NEXT-STATE is a symbol: the name of a state. ;;; Code: ;; If a transition has a cons cell or a list as MATCHER, it is ;; expanded into as many transitions with a single character as ;; MATCHER as necessary to cover the range/alternative thus described. ;; So (... ((?a . ?c) nil t next-state) ...) becomes ;; (... (?a nil t next-state) (?b nil t next-state) (?c nil t next-state) ...) (defun state-m-expand-char-range (range) "Expand char-range into a list of characters. The argument RANGE is of the form (STARTING-CHAR . END-CHAR); the return value is a list of all characters between STARTING-CHAR and END-CHAR (both included). This functions is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (and (or (and (consp range) (numberp (car range)) (numberp (cdr range))) (error "Not a valid char-range: %s" range)) (or (< (car range) (cdr range)) (error "A char-range must start with the lower char: %s" range)) (let ((list nil)) (dotimes (i (1+ (- (cdr range) (car range))) (nreverse list)) (push (+ (car range) i) list))))) (defun state-m-expand-transition (trans) "Expand a transition with char-alternatives into a list of transitions. The argument TRANS is a transition (a list). The car of a transition is called MATCHER. If MATCHER is a regexp, a single character or the symbol t, this function returns the transition unmodified. If MATCHER is a character-alternative, the transition is \"expanded\". A character-alternative may be a cons cell specifying a range of characters or a list of cons cells or single characters. The cdr of each transition in the returned list is the cdr of TRANS and their car is a single character within the range specified by the car of TRANS. This functions is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." ;; The car of trans may be either an atom, a char-range (a cons ;; cell) or a list of character alternatives. (cond ((atom (car trans)) trans) ((nlistp (cdar trans)) ;; The car of trans is a cons cell. (mapcar (lambda (elt) (cons elt (cdr trans))) (state-m-expand-char-range (car trans)))) (t ;; It is a list. (mapcar (lambda (char) (cons char (cdr trans))) (apply 'nconc ;; We use `nconc' to splice the return value of ;; `state-ma-expand-char-range' into the list of ;; characters. (mapcar (lambda (elt) (cond ((numberp elt) (list elt)) ;; Expand character ranges of the ;; form (STARTING-CHAR . END-CHAR), ;; so that `char-list' is a list of ;; characters only. ((consp elt) (state-m-expand-char-range elt)) (t (error "Invalid MATCHER: %s" (car trans))))) (car trans))))))) (defun state-m-deal-with-lambda-maybe (expr) "Transform an anonymous function, if necessary. Unless EXPR is a lambda expression, return it unmodified. Lambda expressions and functions quoted with `function' are returned as a list of a `,' and the expression itself, so that `(lambda ...)' becomes `(\, (lambda ...))'. This is done in order to force byte compilation of anonymous functions specified as ADD-TO-RESULT or ADVANCE in a transition by wrapping the final states vectors into a backquote. This functions is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." ;; Example: in the expansion of `run-state-machine' the state ;; vectors will read: ;; (backquote [ ... (( ... (\, (lambda ...)) ... )) ...]) ;; instead of simply: ;; [ ... (( ... (lambda ...) ... )) ...] (if (and (listp expr) (or (eq (car expr) 'lambda) (eq (car expr) 'function))) (list '\, expr) expr)) (defun state-m-make-transitions (state-definitions) "Transform STATE-DEFINITIONS into a list of transitions. STATE-DEFINITIONS is a list whose elements have the form \(STATE-NAME TRANSITION-1 ... TRANSITION-N). Each transition has the name of the next state as its last element. This function gets rid of the state-names and replaces them in the transitions with an integer, because the expanded macro `run-state-machine' refers to states not by their names, but by their position in an array. This function returns a list of lists of transitions, whereby the N-th element in the returned list is the list of transitions that make up the definition of state N. This functions is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." ;; First associate the name of the states with consecutively ;; running integers, that will make up their position in the ;; vector. (let* ((count 0) (pos-alist (mapcar (lambda (elt) (cons elt (prog1 count (setq count (1+ count))))) (mapcar 'car state-definitions)))) ;; Replace in the transitions the names of the next states by ;; their position in the array. (mapcar (lambda (elt) (let ((trans (cdr elt)) (list nil)) (dolist (i trans (nreverse list)) (if (listp (car i)) ;; Expand transitions which specify a list of ;; alternative chars or a character range as ;; MATCHER. (dolist (elt (state-m-expand-transition i)) (push (list (car elt) ;; Transform anonymous functions ;; as ADD-TO-RESULT or ADVANCE, ;; if necessary. (state-m-deal-with-lambda-maybe (nth 1 elt)) (state-m-deal-with-lambda-maybe (nth 2 elt)) ;; Get the position of ;; the next state. (cdr (assq (nth 3 elt) pos-alist))) list)) ;; Since its car is not a list, transition `i' ;; needs no expansion. (push (list (car i) (state-m-deal-with-lambda-maybe (nth 1 i)) (state-m-deal-with-lambda-maybe (nth 2 i)) (cdr (assq (nth 3 i) pos-alist))) list))))) state-definitions))) ;; A state is defined by a set of possible transitions. We maintain ;; the definition of the state machine in up to three different ;; vectors, according to the type of the MATCHER of a transition. The ;; element N of each vector is a part of the definition of state N. It ;; may be a list of transitions of the same type or nil, if the state ;; definition contains no transitions of that type. At macroexpansion ;; time `run-state-machine' replaces the name of a state as NEXT-STATE ;; in each transition with an integer referring to the position of ;; that state in the vectors. ;; The three vectors are: ;; `regexp-vect' -- transitions whose MATCHER is a regexp ;; `char-vect' -- transitions whose MATCHER is a character ;; `defaults-vect' -- transitions whose MATCHER is the symbol t ;; For example a state definition like ;; ... ;; (example-state (?a t t another-state-1) ;; (?b t t another-state-2) ;; ("some \\(regexp\\)" 1 t another-state-3) ;; (t t t another-state-4)) ;; ... ;; could be transformed into something like this: (Of course, the name ;; of the next state -- `another-state-*' -- is replaced by an integer ;; indicating the position of that next state in the vectors.) ;; regexp-vector: ;; [ ... (("some \\(regexp\\)" 1 t 5)) ... ] ;; char-vector: ;; [ ... ((?a t t 3) (?b t t 4)) ... ] ;; defaults-vector: ;; [ ... ((t t t 6)) ... ] ;; The resulting state machine will check for a matching transition in ;; those vectors in that order, regular expressions first, defaults ;; last. ;; If a state machine contains no transitions of a certain type, the ;; according vector and the checking is omitted. That is, if a ;; state machine contains -- for example -- only transitions with a ;; character as MATCHER, there is no need to maintain a vector for ;; regexps and to check for a matching regexp at run time. (defun state-m-list-transitions-by-type (type trans-list) "Return a list of transitions of type TYPE. TYPE may be one of the symbols `regexp', `char' or `default'. TRANS-LIST is a list of lists of transitions as returned by `state-m-make-transitions'. The return value is a list, whose elements are lists of transitions whose MATCHER is of the type TYPE or nil. This is used to sort the transitions. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (let ((func (cond ((eq type 'regexp) 'stringp) ((eq type 'char) 'numberp) ((eq type 'default) (lambda (x) (eq x t))) (t (error "Unknown type: %s" type))))) (mapcar (lambda (state) (delq nil (mapcar (lambda (trans) (if (funcall func (car trans)) trans nil)) state))) trans-list))) (defun state-m-empty-vector-p (vect) "Return t if vector VECT contains only nil. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (let ((predicate t)) (dotimes (i (length vect) predicate) (when (aref vect i) (setq predicate nil))))) (defun state-m-parse-states (states-list) "Parse state definitions. Generate proper transitions from the state definitions, thereby expand transitions whose MATCHER is a character alternative. Sort the transitions into different vectors according to the type of their MATCHER. The return value is a list of three vectors. The first one contains transitions with a regular expression as MATCHER. The second one contains transitions with a character as MATCHER. And the third one contains the default transitions. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (let* ((transitions (state-m-make-transitions states-list)) ;; Make the states vectors by sorting the transitions ;; according to the type of MATCHER. (regexp-vect (apply 'vector (state-m-list-transitions-by-type 'regexp transitions))) (char-vect (apply 'vector (state-m-list-transitions-by-type 'char transitions))) (default-vect (apply 'vector (state-m-list-transitions-by-type 'default transitions)))) ;; Return the vectors in a list. (mapcar (lambda (vect-name) (if (state-m-empty-vector-p (symbol-value vect-name)) ;; If all elements of the vector are nil, return nil ;; instead of the vector. nil ;; Put the vector inside a `(backquote ...)' in order ;; to force byte compilation of anonymous functions as ;; ADD-TO-RESULT or ADVANCE (inside of ;; `state-m-make-transitions' we made sure that lambda ;; expressions are surrounded by the nessecary syntax ;; to accomplish this). (list 'backquote (symbol-value vect-name)))) '(regexp-vect char-vect default-vect)))) ;; The following is a bit hairy. We need to construct the ;; state machine, but each of the three possible vectors needs a ;; slightly different treatment. For example: to find the matching ;; transition the state machine has to check the MATCHER of each ;; transition in the regexp-vector with `looking-at', while a matching ;; transition in the char-vector is the first one whose MATCHER is ;; equal to the character after point and while the first transition ;; in the defaults-vector matches unconditionally. Something similar ;; is true for adding something to the return value of the ;; state machine and for advancing point in the buffer as specified by ;; ADD-TO-RESULT and ADVANCE. ;; We deal with this by modularizing the lisp-expressions for this ;; tasks and putting them together in the function ;; `state-m-make-main-expr'. ;; The benefit of this approach is, that if a state machine has no ;; transitions of one or two of the three types, then the expansion of ;; `run-state-machine' contains not a single trait of the according ;; vector. (defun state-m-make-find-expr (type vect state-var) "Return the expression to find the right transition. The first argument TYPE specifies the type of the transitions-vector, it may be one of the symbols `regexp', `char' or `default'. The second argument VECT is the vector that holds the states. The third argument STATE-VAR is the name of the variable that is used to hold the current state in the expanded code. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (cond ((eq type 'regexp) ;; regexp: go through each possible ;; transition until MATCHER in one of them ;; matches the text after point. `(catch 'state-found (dolist (trans (aref ,vect ,state-var) nil) (when (looking-at (car trans)) (throw 'state-found trans))))) ((eq type 'char) ;; character: get the transition whose ;; MATCHER is eq to the char after point. `(assq (char-after)(aref ,vect ,state-var))) ((eq type 'default) ;; default: get the next best transition ;; unconditionally. `(car (aref ,vect ,state-var))))) (defun state-m-make-add-expr (type aux-var output-var) "Return the expression to add to the return value of the state machine. The argument TYPE specifies the type of the transitions-vector, it may be one of the symbols `regexp', `char' or `default'. The second argument AUX-VAR is the name of an auxiliary variable that is used to store temporary input in the expanded code. In this case it holds the second element of a transition \(ADD-TO-RESULT). The third argument OUTPUT-VAR is the name of the variable that is used hold the future return value in the expanded code. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." ;; In all cases ADD-TO-RESULT may be a function. But if it is not, ;; regexp-transitions differ from char-transitions and the defaults ;; in the respect that the former may specify a subexpression (an ;; integer) to add to the result, while it is only t or nil for the ;; latter. (if (eq type 'regexp) ;; Regexps: deal with subexpressions. `(if (numberp (if (eq ,aux-var t) ;; We interpret t as 0. (setq ,aux-var 0) ,aux-var)) (setq ,output-var (concat ,output-var (match-string ,aux-var))) (setq ,output-var (funcall ,aux-var ,output-var (match-string 0)))) ;; Chars/defaults: add the char after point. `(if (eq ,aux-var t) (setq ,output-var (concat ,output-var (char-to-string (char-after)))) (setq ,output-var (funcall ,aux-var ,output-var (char-to-string (char-after))))))) (defun state-m-make-advance-expr (type aux-var) "Return the expression to advance point in the current buffer. The argument TYPE specifies the type of the transitions-vector, it may be one of the symbols 'regexp, 'char or 'default. The second argument AUX-VAR is the name of a variable that is used to store temporary values in the expanded code. In this case it is used to hold the third element of a transition (ADVANCE). This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." ;; As with ADD-TO-RESULT, ADVANCE may be a function in all ;; cases. (if (eq type 'regexp) ;; Regexp: deal with subexpressions. `(if (numberp (if (eq ,aux-var t) ;; Again we interpret t as 0. (setq ,aux-var 0) ,aux-var)) (goto-char (match-end ,aux-var)) (funcall ,aux-var)) ;; Chars/defaults: move forward by one char. `(if (eq ,aux-var t) (forward-char 1) (funcall ,aux-var)))) (defun state-m-make-main-expr-1 (find-expr add-expr advance-expr state-var trans-var output-var aux-var) "Return an expression for the construction of a state machine. The returned expression finds the right transition in a transitions vector and deals with ADD-TO-RESULT and ADVANCE appropriately. (All this is determined by FIND-EXPR, ADD-EXPR and ADVANCE-EXPR.) The other arguments are the names of variables that are used in the expanded code: STATE-VAR -- name of the variable that holds the number of the current state, TRANS-VAR -- name of the variable that holds the matching transition, OUTPUT-VAR -- name of the variable that holds the result value, AUX-VAR -- name of a variable that is used to store temporary values. At run time the expression exits immediately and returns nil if it can not find any matching transition. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." `(and (setq ,trans-var ,find-expr) (progn ;; Maybe add to the return value. (setq ,aux-var (nth 1 ,trans-var)) (when ,aux-var ,add-expr) (setq ,aux-var (nth 2 ,trans-var)) (when ,aux-var ,advance-expr) ;; Return the (position of the) next state. (nth 3 ,trans-var)))) (defun state-m-make-main-expr (states statevar outputvar transvar auxvar) "Return the main expression to construct a state machine. The first argument STATES is a list of definition of states; each element is a list whose car is the name of the state and whose cdr is a list of transitions. The following arguments are the names of variables that are used in the expanded code: STATEVAR -- variable that holds the number of the current state, OUTPUTVAR -- variable that holds the result value, TRANSVAR -- variable that holds the matching transition, AUXVAR -- variable used to store temporary values. This function is supposed to be used only internally by the macro `run-state-machine' during macroexpansion time." (let ((tr-vectors (state-m-parse-states states)) (expression (list `(error "State machine error at state %d" ,statevar)))) ;; The resulting expression shall check for the right transition ;; to the next state in that order: ;; look-up regexp-vector --> look-up char-vector --> ;; look-up defaults-vector --> error (dolist (type '((default . 2) (char . 1) (regexp . 0))) (when (nth (cdr type) tr-vectors) (push (state-m-make-main-expr-1 (state-m-make-find-expr (car type) (nth (cdr type) tr-vectors) statevar) (state-m-make-add-expr (car type) auxvar outputvar) (state-m-make-advance-expr (car type) auxvar) statevar transvar outputvar auxvar) expression))) (if (> (length expression) 1) (cons 'or expression) (car expression)))) ;;;###autoload (defmacro run-state-machine (spec &rest states) "Define a state machine and run it. The first argument SPEC is a list of the form \(RESULT-VARIABLE END-OF-BUFFER-EXPRESSION). RESULT-VARIABLE should be a symbol or nil. In the former case the symbol is used as the variable-name to store the return value of the state machine at run time. This makes it possible to get hold of the return value in the states, for example to change its type. By default the return value is a string. When the state machine reaches the end of the buffer at run time, it terminates. END-OF-BUFFER-EXPRESSION should be a valid Lisp expression, it is called when the state machine reaches the end of the current buffer. If it is omitted or nil, the default behaviour is to return nil. The following arguments are the definitions of the state. Each state defines one or more transitions to following states. The syntax is: STATE ::= ( NAME TRANSITION {TRANSITION} ) TRANSITION ::= ( MATCHER ADD-TO-RESULT ADVANCE NEXT-STATE ) MATCHER ::= CHARACTER | CHAR-ALTERN | REGEXP | t CHAR-ALTERN ::= CHAR-LIST | CHAR-RANGE CHAR-LIST ::= ( {CHARACTER} | {CHAR-RANGE} ) CHAR-RANGE ::= ( CHARACTER . CHARACTER ) ADD-TO-RESULT ::= t | nil | FUNCTION | INTEGER ADVANCE ::= t | nil | FUNCTION | INTEGER In each state the state machine goes through the transitions to find the following state. As soon as it finds a \"matching\" transition, it adds to the return value depending on ADD-TO-RESULT, moves point in the buffer according to ADVANCE and switches to the state indicated by NEXT-STATE. If MATCHER is a regular expression, the transition matches if the regexp matches the text after point with `looking-at'. If MATCHER is a character, the transition matches, when the character after point is `eq' to MATCHER. If MATCHER is t, the transition matches unconditionally. Note that, while the order of transitions in a state-definition is somewhat important, regular expressions as MATCHER have always precedence, regardless where they were defined. Likewise, the state machine checks the default transition (a transition whose MATCHER is `t') always last. If ADD-TO-RESULT is t, the state machine appends the character after point (if MATCHER is a char or t) or the matching text (if MATCHER is a regexp) to the return value. If MATCHER is a regular expression, ADD-TO-RESULT may be an integer, specifying which subexpression to add to the return value. ADD-TO-RESULT may be a function that takes two arguments: the current return value and the pending input. Both are strings. The return value of the state machine is set to the returned value of this function. If ADVANCE is t, the state machine moves point forward by one character (if MATCHER is a character or t) or to the end of the matching text (if MATCHER is a regexp). If MATCHER is a regular expression, ADVANCE may be an integer, specifying the subexpression to whose end point should move. ADVANCE may be a function that takes no arguments. In that case this function takes the responsibility to move point. The starting state is always the first state in the row. Specifying the state name `exit' as the following state in a transition means to terminate processing; you can't define a state named `exit'." ;; Add an exit state at position 0. (push (list 'exit) states) (let ((state (make-symbol "state")) (eob-function (or (cadr spec) 'ignore)) (retval (or (car spec) (make-symbol "return-value"))) (state (make-symbol "state")) (trans (make-symbol "transition")) (auxvar (make-symbol "auxiliary-variable"))) `(let ((,state 1) (,retval nil) ,trans ,auxvar) (while (> ,state 0) ; 0 is the "exit" state. (if (eobp) (progn ,(if (cdr spec) `(setq ,retval ,(cadr spec)) `(setq ,retval nil)) (setq ,state 0)) ; exit (setq ,state ,(state-m-make-main-expr states state retval trans auxvar)))) ,retval))) (provide 'state-m) ;;Local Variables: ;;no-byte-compile: t ;;End: ;;; state-m.el ends here --=-=-= Patch for lispref/text.texi regarding state-m.el: --=-=-= Content-Type: text/x-patch; charset=iso-8859-1 Content-Disposition: attachment; filename=text.diff Content-Transfer-Encoding: 8bit Index: lispref/text.texi =================================================================== RCS file: /cvsroot/emacs/emacs/lispref/text.texi,v retrieving revision 1.57 diff -c -r1.57 text.texi *** lispref/text.texi 13 Sep 2002 19:36:55 -0000 1.57 --- lispref/text.texi 7 Nov 2002 17:34:11 -0000 *************** *** 59,64 **** --- 59,65 ---- * Base 64:: Conversion to or from base 64 encoding. * MD5 Checksum:: Compute the MD5 ``message digest''/``checksum''. * Change Hooks:: Supplying functions to be run when text is changed. + * State Machines:: Writing reader functions. @end menu @node Near Point *************** *** 3779,3781 **** --- 3780,4267 ---- This variable is available starting in Emacs 21. @end defvar + + @node State Machines + @section Writing simple reader functions + @cindex state machines + + A state machine is a model to describe complex control structures. A + state machine is always in a certain state, it receives some input, + possibly generates some output, then it switches to another state, + where it repeats. + + State machines are useful in a wide range of computing tasks. In fact + a computer itself is only a very complex state machine. One area, + however, where they are frequently used, is for reading and parsing + textual input. Emacs provides the macro @code{run-state-machine} to + construct state machines for the purpose of reading text from a + buffer. + + @menu + * State Machine Basics:: A short introduction to state machines. + * Example State Machine:: A basic example. + * Defining State Machines:: Defining state machines for reader + functions. + * Character Based Reading:: Switching states based on the character + after point. + * Regexp Based Reading:: Switching states based on the buffer text + after point. + * State Machine Notes:: Caveats and notes on performance. + @end menu + + @node State Machine Basics + @subsection State Machines -- an Introduction + + A traffic light may serve as a very basic example of a state + machine. It has four states: a red state, a green state and two yellow + states. There is one single input event, the signal ``Change the + colour now!''. As ``output'' it turns certain electric bulbs on or + off. + + @example + @group + +-------+ + ,--------| red |<-------. + | +-------+ | + | | + | | + v | + +-------+ +-------+ + |yellow1| |yellow2| + +-------+ +-------+ + | ^ + | | + | +-------+ | + `------->| green |--------´ + +-------+ + @end group + @end example + + When a state machine receives input, it decides which state should be + the next state. In a traffic light this decision is very simple, + because in each state there is only one type of input event and one + transition to another state possible: when the traffic light is in the + state ``red'' and it receives valid input, it switches to the state + ``yellow1'' unconditionally. A more complex state machine would allow + more types of input events and would depend the decision about the + next state not only on the current state, but also on the + @emph{specific} input event. A state machine that is made to read + textual input would typically depend the switching to the next state + on the actual character it received. + + The way we describe the traffic light above, we assume that it was + always running and that it will run forever. This is, of course, also + not true for a reader. Most state machines would define one state as a + starting state and one or more states as final ones; so that the state + machine terminates after switching to a state declared to be a final + one. + + @node Example State Machine + @subsection A Basic Example of a State Machine + + The macro @code{run-state-machine} provides an easy way to define and + run state machines in Emacs Lisp for the purpose of reading input from + a buffer. Here is a basic usage example as an introduction. + + Suppose we need a function that parses the content of a buffer for + strings. Each time we call this function, it should scan the buffer + for a quotation mark and return the buffer content up to the next + quotation mark, thereby moving point forward. We could use this + function to get every string from the buffer if we call it often + enough. Here is how it could be done with @code{run-state-machine}: + + @lisp + @group + (defun read-next-string () + (run-state-machine () + (look-for-string (?\" nil t read-string) + (t nil t look-for-string)) + (read-string (?\" nil t exit) + (t t t read-string)))) + @end group + @end lisp + + This state machine has only two states, @code{look-for-string} and + @code{read-string}. In each state the machine looks at the character + after point and decides whether it should stay in the same state or + switch to another one. In both cases, however, it moves point forward + by one character. The possible transitions to other states are + defined as lists after the name of the state. Here each state has two + of them. + + The starting state is @code{look-for-string}, because it is the first + one listed. When the machine is in the state @code{look-for-string}, + it checks wether the character after point is a @samp{"}. This is + indicated by the @code{?\"} at the beginning of the first transition. + If it is, it switches to the state @code{read-string}. If the + character after point is anything else but a @samp{"} (indicated by + the symbol @code{t} at the beginning of the second transition), the + machine stays in the state @code{look-for-string}. + + When the state machine is in the state @code{read-string} it checks + again wether the character after point is a @samp{"}. But this time + it terminates if it is. This is specified by the special state name + @code{exit}. In all other cases the string it currently reads is not + finished, so the machine stays it in the the state @code{read-string}. + + On each transition to another state the state machine may store the + character after point internally. The concatentation of all stored + characters is the return value of @code{run-state-machine}. The + second element in the definition of a transition specifies if the + character after point should be appended to the return value in this + way or not. In our example it is t only in the second transition of + @code{read-string}, because we want to return only characters between + two @samp{"}. The third element indicates if the state machine should + move point forward in the buffer; here it is @code{t} in all + transitions. + + This is a diagram of the states and transitions: + @example + @group + Start + | + | + | + v + +-------------------+ + |look-for-string |<---. Character is not a + +-------------------+ | quotation mark. + | | | * Do not store it. + Character is a | `----------´ * Move point forward. + quotation mark. | + * Do not store it. | + * Move point forward. | + v + +-------------------+ + |read-string |<---. Character is not a + +-------------+-----+ | quotation mark. + | | | * Store it. + Character is a | `----------´ * Move point forward. + quotation mark. | + * Do not store it. | + * Move point forward. | + v + Exit + @end group + @end example + + @node Defining State Machines + @subsection Defining State Machines + @cindex writing a reader + + The macro @code{run-state-machine} provides a mini-language to define + state machines for reader-functions. To define a state machine, you + basically list a bunch of states and specify what the machine should + do in the different states by listing and defining the transitions + from each state to following ones. This is explained in more detail + in @ref{Character Based Reading} and @ref{Regexp Based Reading}. + + @deffn Macro run-state-machine spec &rest states + This macro defines and executes a state machine. + + On each transition from one state to the next one the state machine + may accumulate characters or strings from the current buffer. When + the machine terminates, it returns these per default as a concatenated + string; it returns @code{nil}, when no characters or strings were + accumulated. + + The first argument @var{spec}, if non-@code{nil}, is a list of the + form + + @example + (@var{result-variable} @var{end-of-buffer-expression}). + @end example + + @var{result-variable} should be a symbol or @code{nil}. The symbol is + used as the name of the variable that the state machine uses to store + its return value at run time. Specify a symbol, if you want to access + this variable. If you are not interested in what variable name the + state machine uses internally, specify @code{nil}. In this case the + state machine still returns the accumulated characters or strings, but + it deals with them transparently. Please note, that specifying a + symbol might not be necessary in many cases, because you can also + control the return value to a large extend by specifying a function as + @var{add-to-result} in the definition of a transition (see below). + + When a state machine encounters the end of the buffer at run time, it + terminates and returns @code{nil} by default. To change the + return-value, you can specify a Lisp expression as + @var{end-of-buffer-expression} that is executed in this case; the + return value of this expression becomes the return value of the state + machine. To return the accumulated result in whatever state it may be + at that time, simply set @var{result-variable} to some symbol and + repeat that symbol as @var{end-of-buffer-expression}. For example, + @code{(output output)} as @var{spec} leads to a state machine that + uses the variable @var{output} to store the result and returns the + value of @var{output} unmodified, when it terminates at the end of the + buffer. + + @var{spec} may be @code{nil}. This is equivalent to @code{(nil nil)}. + @var{spec} is followed by a number of one or more definitions of + states. + + The definition of a state consists of the name of the state followed + by one or more definitions of transitions to following states. The + starting state is always the first one listed. Specifying the state + name @code{exit} as the following state in a transition means to + terminate processing; you can't define a state named @code{exit}. + + A transition is a list of the form + + @example + (@var{matcher} @var{add-to-result} @var{advance} @var{next-state}) + @end example + + @var{matcher} may be either a regular expression (@pxref{Regexp Based Reading}). + Or a single character (@pxref{Character Type}), a cons cell + of two characters indicating a range of characters or a list of + characters or character ranges indicating character + alternatives. @xref{Character Based Reading}. + + If @var{add-to-result} is @code{t}, the character after point is + appended to the return-value (which--in the normal case--is a string). + If @var{advance} is @code{t}, the machine moves point forward by one + character. Then it switches to the state specified by + @var{next-state}. If @var{matcher} is a regular expression, + @var{add-to-result} and @var{advance} may be integers. In this case + the integer specifies the subexpression of @var{matcher} which should + be added to the result or to whose end point should move, + respectively. @xref{Regexp Based Reading}. + + @var{add-to-result} may be a function. In this case the function + should take two arguments. The first one being the current + return-value and the second one being the pending input-character (as + a string!). The return value of the state machine is then set to the + value returned by this function. If @var{advance} is a function, it + receives no argument. This function takes the full resposibility for + moving point. For example a transition + @code{(?a t t another-state)} + could be written equivalently (modulo performance) as + @code{(?a concat forward-char another-state)}. + @end deffn + + @node Character Based Reading + @subsection Switching States Based on the Character after Point + + In character-based reading, a state machine specified with + @code{run-state-machine} switches states based on the character after + point. + + In the simple case @var{matcher} is either a character or the symbol + @code{t}. To find the right transition, the state machine looks for a + transition whose @var{matcher} is @code{eq} to the character after + point. If it finds none, it looks for a transition whose @var{matcher} + is @code{t}. + + For example, a state could look like this: + + @example + @group + ;; MATCHER ADD-TO-RESULT ADVANCE NEXT-STATE + (look-for-a (?a t t another-state-1) + (t nil nil another-state-2)) + @end group + @end example + + When the state machine is in the state @code{look-for-a}, it checks if + the character after point is an @samp{a}. If it is, it adds an + @samp{a} to the return value, moves point forward and switches to the + state @code{another-state-1}. Else, it does not add anything to the + return value, leaves point where it is and switches to the state + @code{another-state-2}. + + You can specify alternative characters or a range of characters as + @var{matcher} in a transition. To specify a range of characters, + define a cons cell of the first and the last character. @code{(?a + . ?z)} as @var{matcher} matches all characters from @samp{a} up to + @samp{z}. To specify alternative matches, use a list of characters or + of character ranges. For example @code{(?a ?b ?c)} as @var{matcher} + matches the characters @samp{a}, @samp{b} or @samp{c}, while + @code{((?0 . ?9) ?-)} matches digits and hyphens. + + As an extended example, here is the definition of a reader-function + @code{example-reader} that returns everything inside @samp{"} as a + string, digits as integers and everything else as a symbol. Each time + the function is called it reads the next object from the current + buffer and returns it according to its type. If it encounters the end + of the buffer, it raises an error. + + @smalllisp + @group + (defun exmpl-make-number (output ignore) + (string-to-number output)) + @end group + + @group + (defun exmpl-make-symbol (output ignore) + (make-symbol output)) + @end group + + @group + (defun example-reader () + (run-state-machine (nil (error "End of buffer")) + ;; @r{Starting state.} + (start (?\" nil t read-string) + ((?0 . ?9) t t read-integer) + ((?\n ?\t ?\ ) nil t skip-white-space) + (t t t read-symbol)) + ;; @r{Skip white-space} + (skip-white-space ((?\t ?\n ?\ ) nil t skip-white-space) + (t nil nil start)) + ;; @r{Read a string: add everything to the return value up to the} + ;; @r{next `"'. Then exit.} + (read-string (?\" nil t exit) + (t t t read-string)) + ;; @r{Read an integer: add every digit to the return value. Every} + ;; @r{other character causes the machine to exit. Convert the} + ;; @r{return-value to an integer upon exit.} + (read-integer ((?0 . ?9) t t read-integer) + (t exmpl-make-number nil exit)) + ;; @r{Read a symbol: read everything which is not a digit, a `"' or a} + ;; @r{white-space character. Return a symbol.} + (read-symbol ((?\" ?\ ?\t ?\n (?0 . ?9)) + exmpl-make-symbol nil exit) + (t t t read-symbol)))) + @end group + @end smalllisp + + @node Regexp Based Reading + @subsection Switching States Based on the Buffer Text after Point + + Instead of a character (or a list of character-alternatives) or the + symbol @code{t}, @var{matcher} may be a regular expression. + @xref{Regular Expressions}. In this case the state machine checks for + a matching transition by applying the regexp with @code{looking-at} to + the text after point. + + When @var{matcher} is a regexp @var{add-to-result} and @var{advance} + may be integers. An integer as @var{add-to-result} specifies the + subexpression which is added to the return-value. An integer as + @var{advance} specifies to the end of which subexpression point should + move. In both cases @code{t} is interpreted as @code{0}. + + As an example, suppose we have a silly database file in which we store + information about persons, animals and text editors. Now we want to + extract the names of persons by consecutive calls to a function + @code{read-person-name}; i. e. the function should skip the records + for animals and text editors as well as records for persons where the + name field is omitted. The entries could look like this: + + @example + @group + Type Animal + Name Frog + Colour Green + @end group + + @group + Type Person + # name unknown + Colour Red # colour of hair + Number 12345 # postal code + @end group + + @group + Type Text-Editor + # The One True Editor + Name Emacs + Number 21.4 # version + @end group + + @group + Type Person + Name Tars Tarkas # This is the name we want. + Colour Green + @end group + + @end example + + To simplify things, we assume that the entry for each field is on a + line of it's own, so we can do the parsing line-wise. Fields may + occur in abitrary order, except for the @samp{type} field, which must + be the first field in the record. Each record starts with a + declaration of the @samp{type}; @samp{#}s are comment-characters; + empty lines and leading whitespace are legal, but not significant; + case is also insignificant. + + @lisp + @group + (defun read-person-name () + "Read the next name in the category \"Person\"." + (let ((case-fold-search t)) ; ignore case + (run-state-machine () + (look-for-person + ("\\s-*type\\s-+person" nil forward-line get-name) ; @r{person found} + (t nil forward-line look-for-person)) ; @r{keep on searching} + (get-name + ("\\s-*type" nil nil look-for-person) ; @r{no name was provided} + ("\\s-*name\\s-+\\(.*?\\)\\s-*#.*$" 1 forward-line exit) ; @r{name found} + (t nil forward-line get-name))))) ; @r{keep on searching} + @end group + @end lisp + + This function parses the text line by line, because @var{advance} is + either @code{nil} or the function @code{forward-line}; one effect of + this is that all the regular expressions match from the beginning of + the line. When it is in the state @code{look-for-person} the function + moves forward in the buffer until it finds a record that starts with + @samp{type person}. Then it switches to the state @code{get-name}. + Again it moves point forward line by line until it finds a @samp{name} + field. When it finds one, the state machine terminates and returns + the name; this happens in the second transition of @code{get-name}: + the regexp is constructed in such a way that characters after + @samp{name} and before any @samp{#} match the first subexpression, + which is added to the return value, because @var{add-to-result} is + @code{1} in this transition. + + But when the state machine encounters another @samp{type} field + indicating a new record before it finds a name, it switches back to + the state @code{look-for-person}. This is specified in the first + transition of @code{get-name}. (@var{advance} is @code{nil} here, + because @samp{type} could be @samp{person} again. Then the state + @code{look-for-person} should switch back to @code{get-name} + immediately.) So if point is at the beginning of the first record in + the example (@samp{Type Animal}), the first call to + @code{read-person-name} will return @samp{"Tars Tarkas"}, although + there is another person in between. But this other record provides no + name and is therefore ignored. + + @node State Machine Notes + @subsection Caveats and Notes on Performance + + @enumerate + + @item + Although it is possible to apply both transitions with regular + expressions and with characters as @var{matcher} in one and the same + state machine, this might add some undesirable additional overhead, + especially if the majority of transitions has characters as + @var{matcher}. In this case it might be worth the extra effort to get + rid of regexp entirely for the sake of performance. This is due to + the internal handling of the transitions. As far as this note is + concerned, character alternatives and character ranges count as + @var{matcher} of the type ``character''. + + @item + The checking for transitions with regexps as @var{matcher}, characters + as @var{matcher} and the default transitions is done independently and + in this order: regexps first, defaults last. + + @item + The order of transitions in the definition of a state is significant. + If any two transitions whose @var{matcher} is of the same type would + match in the current current buffer, the state machine chooses the + transition that comes first in the state-definition. For example if + one transition has @code{?y} as @var{matcher} and another transition + has @code{(?a . ?z)} as @var{matcher} and the character after point is + a @samp{y}, then the transition that comes first in the definition of + the state is chosen. @emph{But}: regexps as @var{matcher} come always + before characters as @var{matcher} and defaults come always last, + regardless of where they were defined. + + @item + The macro @code{run-state-machine} does quite some computation on each + expansion. Therefore it is strongly recommended to byte-compile a + package that uses it. + + @end enumerate --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Oliver -- Oliver Scholz 23 Brumaire an 211 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org --=-=-=--