unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Stefan Israelsson Tampe <stefan.itampe@gmail.com>
To: guile-user@gnu.org, guile-devel@gnu.org
Subject: Parsing and logic programming
Date: Sat, 10 Aug 2013 23:22:05 +0200	[thread overview]
Message-ID: <1453317.eM6fmhrFaU@warperdoze> (raw)

Hi all!

From time to time I return to logic programming. Scheme has kanren,
but I prefer the more feature rich environment that comes with 
guile-log (It actually sports a kanren interface as well that should be 
the 
fastest way to run kanren on guile. 

A quite recent feature I added was support for returning multiple
values in logic programs. The mechanism is based on enlarging the 
continuation closures with more arguments. This means that although
everything is recursive, we do always stick to tail calls and hence
these logic programs does not risk of blowing the stack (It can blow
the heap though and main kanren contains mechanisms can even not blow
the heap). One method to not blow the heap in guile log is to use
return values for some set of algorithms. There is a drawback I must 
confess using the return values, and that is the we do not get a
'equations' like features for items. This is fine sometimes but
inferior at other times. To see an example of the equation property
is e.g. 

   (kanren-cons car cdr result)

car and cdr is input and result is the output in our minds, but it's
just ass fine to enter result= (1 . 2) car = var, cdr = var and get
the car = 1 and cdr = 2 after the function have been handled. 

What is return values, well this is forcing things to be only return 
values. In guile-log we do this by e.g.,

   1. put (<cc> x y z) in tail position in e.g. function f.
   2. "call" a function f with (<values> (x y z) (f a b c))
      note how we cannot peak bindings to x y z in function f. This
      shows that we are limiting ourself on the ability of making a
      'equational' like logic.

To note here is that x y z will be just normal scheme variables that
can be used further down the logic chain. 

When is this logic preferable? Well it's possible to make parsers
using logic programming and make use of their backtracking
features. It will be slower to do so then good 'ol targeted tools but
the toolbox that comes with guile-log means that advanced stuff can be
pretty simple to do. 

To exemplify, let's try to use guile-log to deduce a little framework
for parsing and tokenizing. Here we go (DISCLAIMER NOT DEBUGGED CODE).


------------------------------------------------
(use-modules (logic guile-log))
(use-modules ((ice-9 match) #:renamer (prepend-tag 'ice-9-')))


;; this one catches the common pattern of matching and gives
;; some syntactic sugare, please read on.

(define-syntax-rules (pmatch (x xl c n m) (p1 p2 code ...) ...)
  (<match> (#:mode +) (x c)
	   (p1 p2 (<cut> (<and> code ...)))
	   ...
	   (_ _   (<cut> <fail>))))

;; read on
(define-syntax-rules (plambda (x xl c n m) . l)
  (<lambda> (x xl c n m)
    (pmatch (x xl c n m) . l)))

;; we need some notion of always fail and always success
(define p-false (<lambda> x <fail>))
(define p-true  (<lambda> (x xl c n m) (<cc> x xl c n m)))
			  
;; se how we have the ice-9 matcher framework tooled to have prolog
;; like lambdas. the fuction signature (x xl c n m) are standard and
;; will consist of 
;; x  =  the stream of characters for one line (list)
;; xl =  list of lines e.g. xl = (list x1 x2 ...)
;; c  =  this is the imprint of the result.
;; n  =  col number
;; m  =  line number

(define (p-test f)
 (plambda (x xl c n m)
   (((? f ch) . l) (ch . cc)
    (<cc> x l cc (+ n 1) m))))
#|
The pattern is (list-stream-pat result-pat code ...) Typically 
c is the incomming inprint ans id just a unbound variable, unifying 
against
(ch . cc) means that cc is bound to the cdr a unbound varaible and and
the car of c will havd the ch inprint, just as in prolog.

  list-stream-pat = (? f ch)  - we check if f is true on ch
  result-pat      = (ch . cc) - inprint cc new unbound variable

and we just return the values with
(<cc> x l cc (+ n 1) m)))) ; note that we add one col number.
|#
      
;; Lets define a ideom that generat a "matcher" for a char.
;; This shows how we use p-test
(define (p-char ch) (p-test (lambda (x) (eq? x ch))))

;; a sequence of matchers can be defined using e.g (p-seq f1 f2 f3 f4 
...)
(define (p-seq . l)
  (ice-match l
    (()
     p-true  ;; zero always matches
    ((f1) 
     f1)     ;; one matcher is just one matcher
    ((f1 . l)
     (let ((f2 (apply p-seq l))) ;; f2 is the sequence of the rest of
                                 ;; the matchers

       ;; The matcher is:
       (<lambda> (x xl c n m)
	 (values (x xl c n m) (f1 x xl c n m))  ;; First call f1, get return
	 (f2 x xl c n m))))))                   ;; Then scan the rest.

;; ok, lets assume that we want a and b and c to match the same prefix
;; and that we return the last prefix, here is the code for that
(define (p-and . l)
  (ice-match l
    (()
     p-true
    ((f) 
     f)
    ((f . l)
     (let ((f2 (apply p-and l)))
       (<lambda> (x xl c n m)
	 (<and> (f x xl c n m) (f2 x xl c n m)))))))

;; This will try to match a if that fails b the one that successes
;; return their values.

(define (p-or . l)
  (ice-match l
    (()
     p-false
    ((f) 
     f)
    ((f . l)
     (let ((f2 (apply p-or l)))
       (<lambda> (x xl c n m)
	 (<or> (f x xl c n m) (f2 x xl c n m)))))))

#|
Now we have made a small toolbox of basic logic for combining matcher,
you see the pattern and we could add interleaving versions and so on
of and etc, quite easally.

Anyhow, people typically write tokenizer and parser separatly because
the tokens are w well confined entity and is typically unambighues and
can be decided ffrom just local properties. To mimic this separation
one can employ the following little tokenizer construct that
essentially creates tokens and store them. If we backtrack and reissue
the logic the stored value will be used.
|#

;; Some global state
(define run-n 0)
(define run-m 0)
(define map   #f)

;; We need to reset the state
(define (clear-tokens)
  (set! run-n 0)
  (set! run-m 0)
  (set! map (make-has-table)))

(clear-tokens)

;; The idea of this function is to perform a tokenizing activity
;; utilizing this means that we loose the ability to redo and undo

;; generally this is not as effective as a regexp tokenizer but should be 
a
;; much faster then doing a full parser of everything.

(define (mk-token f mk)
  (<lambda> (x xl c n m)
     ;; Check to see if we are in new territory
     (if (or (> m run-m) (and (= run-m m) (> n run-n)))
         ;; Virgin land here
	 (<var> (cnew)                               ;; Create a fresh 
variable
	  (values (x xl cc n2 m2) (f x xl cnew n m)) ;; Call the matcher
	  (<=> cc '())                               ;; '() marks end
                                                     ;;  of token
	  (<let> ((val (mk (<scm> cnew))))           ;; get the scheme
                                                     ;; list and apply 
mk
          
	    (<code>                                 ;; store the new land
	     (set! run-n n2)
	     (set! run-m m2)
	     (hash-set! map (cons n m) (list x xl val n2 m2))
	   (<match> () (c)                          ;; add val to the 
	     ((,val . o))                           ;; parent res stream
	        (<cc> x xl o n2 m2))))

	 (<let> ((v (hash-ref map (cons n m) #f))) ;; retrieve token data
	     (<match> () (v c)                     ;; add it to the parent
	       ((x xl val n m) (val . o)           ;; res stream
	        (<cc> x xl o n m))))))
	      	 
;; zero or more f matches
(define (p* f) (p-or (p-seq f (p* f)) p-true))

;; one or more f matches
(define (p+ f) (p-seq f (p* f)))

;; read-mk, used in tokenizer when sheme can translate a token
(define (read-mk x) 
  (with-input-from-string (list->string x) read))

		      
;; Example, create a number token
(define p-digit   (p-test char-numeric?))
(define p-0       (p-char #\0))
(define p-.       (p-char #\.))
(define p-decimal (p-seq (p+ p-digit) 
			 (p-or (p-seq p-.
				      (p* p-digit))
			        p-true)))
			       
(define p-exponent (p-seq p-decimal 
			  (p-char #\e)
			  (p-or (p-char #\+)
				(p-char #\-)
				p-true)
			  p-decimal))

(define p-number    (p-or p-exponent p-decimal))
(define tok-number  (mk-number p-number read-mk))

------------------------------------------------------
For any of you, a good exercise is to download kanren and do a similar 
framework using return values Have fun!




                 reply	other threads:[~2013-08-10 21:22 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1453317.eM6fmhrFaU@warperdoze \
    --to=stefan.itampe@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).