Parsing and logic programming

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* Parsing and logic programming
@ 2013-08-10 21:22 Stefan Israelsson Tampe
  0 siblings, 0 replies; only message in thread
From: Stefan Israelsson Tampe @ 2013-08-10 21:22 UTC (permalink / raw)
  To: guile-user, guile-devel

Hi all!

From time to time I return to logic programming. Scheme has kanren,
but I prefer the more feature rich environment that comes with 
guile-log (It actually sports a kanren interface as well that should be 
the 
fastest way to run kanren on guile. 

A quite recent feature I added was support for returning multiple
values in logic programs. The mechanism is based on enlarging the 
continuation closures with more arguments. This means that although
everything is recursive, we do always stick to tail calls and hence
these logic programs does not risk of blowing the stack (It can blow
the heap though and main kanren contains mechanisms can even not blow
the heap). One method to not blow the heap in guile log is to use
return values for some set of algorithms. There is a drawback I must 
confess using the return values, and that is the we do not get a
'equations' like features for items. This is fine sometimes but
inferior at other times. To see an example of the equation property
is e.g. 

   (kanren-cons car cdr result)

car and cdr is input and result is the output in our minds, but it's
just ass fine to enter result= (1 . 2) car = var, cdr = var and get
the car = 1 and cdr = 2 after the function have been handled. 

What is return values, well this is forcing things to be only return 
values. In guile-log we do this by e.g.,

   1. put (<cc> x y z) in tail position in e.g. function f.
   2. "call" a function f with (<values> (x y z) (f a b c))
      note how we cannot peak bindings to x y z in function f. This
      shows that we are limiting ourself on the ability of making a
      'equational' like logic.

To note here is that x y z will be just normal scheme variables that
can be used further down the logic chain. 

When is this logic preferable? Well it's possible to make parsers
using logic programming and make use of their backtracking
features. It will be slower to do so then good 'ol targeted tools but
the toolbox that comes with guile-log means that advanced stuff can be
pretty simple to do. 

To exemplify, let's try to use guile-log to deduce a little framework
for parsing and tokenizing. Here we go (DISCLAIMER NOT DEBUGGED CODE).

------------------------------------------------
(use-modules (logic guile-log))
(use-modules ((ice-9 match) #:renamer (prepend-tag 'ice-9-')))

;; this one catches the common pattern of matching and gives
;; some syntactic sugare, please read on.

(define-syntax-rules (pmatch (x xl c n m) (p1 p2 code ...) ...)
  (<match> (#:mode +) (x c)
	   (p1 p2 (<cut> (<and> code ...)))
	   ...
	   (_ _   (<cut> <fail>))))

;; read on
(define-syntax-rules (plambda (x xl c n m) . l)
  (<lambda> (x xl c n m)
    (pmatch (x xl c n m) . l)))

;; we need some notion of always fail and always success
(define p-false (<lambda> x <fail>))
(define p-true  (<lambda> (x xl c n m) (<cc> x xl c n m)))

;; se how we have the ice-9 matcher framework tooled to have prolog
;; like lambdas. the fuction signature (x xl c n m) are standard and
;; will consist of 
;; x  =  the stream of characters for one line (list)
;; xl =  list of lines e.g. xl = (list x1 x2 ...)
;; c  =  this is the imprint of the result.
;; n  =  col number
;; m  =  line number

(define (p-test f)
 (plambda (x xl c n m)
   (((? f ch) . l) (ch . cc)
    (<cc> x l cc (+ n 1) m))))
#|
The pattern is (list-stream-pat result-pat code ...) Typically 
c is the incomming inprint ans id just a unbound variable, unifying 
against
(ch . cc) means that cc is bound to the cdr a unbound varaible and and
the car of c will havd the ch inprint, just as in prolog.

  list-stream-pat = (? f ch)  - we check if f is true on ch
  result-pat      = (ch . cc) - inprint cc new unbound variable

and we just return the values with
(<cc> x l cc (+ n 1) m)))) ; note that we add one col number.
|#

;; Lets define a ideom that generat a "matcher" for a char.
;; This shows how we use p-test
(define (p-char ch) (p-test (lambda (x) (eq? x ch))))

;; a sequence of matchers can be defined using e.g (p-seq f1 f2 f3 f4 
...)
(define (p-seq . l)
  (ice-match l
    (()
     p-true  ;; zero always matches
    ((f1) 
     f1)     ;; one matcher is just one matcher
    ((f1 . l)
     (let ((f2 (apply p-seq l))) ;; f2 is the sequence of the rest of
                                 ;; the matchers

       ;; The matcher is:
       (<lambda> (x xl c n m)
	 (values (x xl c n m) (f1 x xl c n m))  ;; First call f1, get return
	 (f2 x xl c n m))))))                   ;; Then scan the rest.

;; ok, lets assume that we want a and b and c to match the same prefix
;; and that we return the last prefix, here is the code for that
(define (p-and . l)
  (ice-match l
    (()
     p-true
    ((f) 
     f)
    ((f . l)
     (let ((f2 (apply p-and l)))
       (<lambda> (x xl c n m)
	 (<and> (f x xl c n m) (f2 x xl c n m)))))))

;; This will try to match a if that fails b the one that successes
;; return their values.

(define (p-or . l)
  (ice-match l
    (()
     p-false
    ((f) 
     f)
    ((f . l)
     (let ((f2 (apply p-or l)))
       (<lambda> (x xl c n m)
	 (<or> (f x xl c n m) (f2 x xl c n m)))))))

#|
Now we have made a small toolbox of basic logic for combining matcher,
you see the pattern and we could add interleaving versions and so on
of and etc, quite easally.

Anyhow, people typically write tokenizer and parser separatly because
the tokens are w well confined entity and is typically unambighues and
can be decided ffrom just local properties. To mimic this separation
one can employ the following little tokenizer construct that
essentially creates tokens and store them. If we backtrack and reissue
the logic the stored value will be used.
|#

;; Some global state
(define run-n 0)
(define run-m 0)
(define map   #f)

;; We need to reset the state
(define (clear-tokens)
  (set! run-n 0)
  (set! run-m 0)
  (set! map (make-has-table)))

(clear-tokens)

;; The idea of this function is to perform a tokenizing activity
;; utilizing this means that we loose the ability to redo and undo

;; generally this is not as effective as a regexp tokenizer but should be 
a
;; much faster then doing a full parser of everything.

(define (mk-token f mk)
  (<lambda> (x xl c n m)
     ;; Check to see if we are in new territory
     (if (or (> m run-m) (and (= run-m m) (> n run-n)))
         ;; Virgin land here
	 (<var> (cnew)                               ;; Create a fresh 
variable
	  (values (x xl cc n2 m2) (f x xl cnew n m)) ;; Call the matcher
	  (<=> cc '())                               ;; '() marks end
                                                     ;;  of token
	  (<let> ((val (mk (<scm> cnew))))           ;; get the scheme
                                                     ;; list and apply 
mk

	    (<code>                                 ;; store the new land
	     (set! run-n n2)
	     (set! run-m m2)
	     (hash-set! map (cons n m) (list x xl val n2 m2))
	   (<match> () (c)                          ;; add val to the 
	     ((,val . o))                           ;; parent res stream
	        (<cc> x xl o n2 m2))))

	 (<let> ((v (hash-ref map (cons n m) #f))) ;; retrieve token data
	     (<match> () (v c)                     ;; add it to the parent
	       ((x xl val n m) (val . o)           ;; res stream
	        (<cc> x xl o n m))))))

;; zero or more f matches
(define (p* f) (p-or (p-seq f (p* f)) p-true))

;; one or more f matches
(define (p+ f) (p-seq f (p* f)))

;; read-mk, used in tokenizer when sheme can translate a token
(define (read-mk x) 
  (with-input-from-string (list->string x) read))

;; Example, create a number token
(define p-digit   (p-test char-numeric?))
(define p-0       (p-char #\0))
(define p-.       (p-char #\.))
(define p-decimal (p-seq (p+ p-digit) 
			 (p-or (p-seq p-.
				      (p* p-digit))
			        p-true)))

(define p-exponent (p-seq p-decimal 
			  (p-char #\e)
			  (p-or (p-char #\+)
				(p-char #\-)
				p-true)
			  p-decimal))

(define p-number    (p-or p-exponent p-decimal))
(define tok-number  (mk-number p-number read-mk))

------------------------------------------------------
For any of you, a good exercise is to download kanren and do a similar 
framework using return values Have fun!

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-08-10 21:22 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-10 21:22 Parsing and logic programming Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).