From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.user,gmane.lisp.guile.devel Subject: Parsing and logic programming Date: Sat, 10 Aug 2013 23:22:05 +0200 Message-ID: <1453317.eM6fmhrFaU@warperdoze> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit X-Trace: ger.gmane.org 1376169759 20471 80.91.229.3 (10 Aug 2013 21:22:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 10 Aug 2013 21:22:39 +0000 (UTC) To: guile-user@gnu.org, guile-devel@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Aug 10 23:22:39 2013 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1V8GcQ-0001OT-Up for guile-user@m.gmane.org; Sat, 10 Aug 2013 23:22:39 +0200 Original-Received: from localhost ([::1]:47197 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V8GcQ-0005aX-FP for guile-user@m.gmane.org; Sat, 10 Aug 2013 17:22:38 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V8GcA-0005a1-FV for guile-user@gnu.org; Sat, 10 Aug 2013 17:22:30 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V8Gc1-0005Wk-LD for guile-user@gnu.org; Sat, 10 Aug 2013 17:22:22 -0400 Original-Received: from mail-lb0-x233.google.com ([2a00:1450:4010:c04::233]:55670) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V8Gc1-0005WT-8p; Sat, 10 Aug 2013 17:22:13 -0400 Original-Received: by mail-lb0-f179.google.com with SMTP id v1so3905682lbd.24 for ; Sat, 10 Aug 2013 14:22:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:user-agent:mime-version :content-transfer-encoding:content-type; bh=YUTaaAxeg2/ZW7OXJv4Jq4qB8Lzwp8uI6sZMMqrmkfo=; b=rvGwhf3Xa4kQIwu9cp4mMTW9B5K96klZiwVfxbdv/FWWXK8ax5pP+P4Vql7wtkks6x 897UIHgSsrgg9voHK30G9KMq27v1st0f3+D4H6i4ildq4+ESCXULbvmWo2yef9u0msfH oirJoNUapLAlHU5jueYWdoMUVkY4V17DPNTgReKyH9Om0GmSZ8JW0+8RTlOKue+f77PR eYWCceEW4tdZJ2tTQS8m76K5ZNM1Szr4gl9LSDqDlxFMzi6r16R/T268AFdzFwQ9UJQy 9GM/PHDdHnTBq/L5l4Sjlhgdpy7dr2X+UOXFMkPd1CKD2JadMEba3scDytH8Mu3gDHNK ZeLw== X-Received: by 10.152.20.1 with SMTP id j1mr8407445lae.59.1376169731566; Sat, 10 Aug 2013 14:22:11 -0700 (PDT) Original-Received: from warperdoze.localnet (1-1-1-39a.veo.vs.bostream.se. [82.182.254.46]) by mx.google.com with ESMTPSA id xr1sm4039744lbb.14.2013.08.10.14.22.09 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 10 Aug 2013 14:22:10 -0700 (PDT) User-Agent: KMail/4.9.5 (Linux/3.5.0-30-generic; KDE/4.9.5; x86_64; ; ) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4010:c04::233 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:10615 gmane.lisp.guile.devel:16551 Archived-At: Hi all! >From time to time I return to logic programming. Scheme has kanren, but I prefer the more feature rich environment that comes with guile-log (It actually sports a kanren interface as well that should be the fastest way to run kanren on guile. A quite recent feature I added was support for returning multiple values in logic programs. The mechanism is based on enlarging the continuation closures with more arguments. This means that although everything is recursive, we do always stick to tail calls and hence these logic programs does not risk of blowing the stack (It can blow the heap though and main kanren contains mechanisms can even not blow the heap). One method to not blow the heap in guile log is to use return values for some set of algorithms. There is a drawback I must confess using the return values, and that is the we do not get a 'equations' like features for items. This is fine sometimes but inferior at other times. To see an example of the equation property is e.g. (kanren-cons car cdr result) car and cdr is input and result is the output in our minds, but it's just ass fine to enter result= (1 . 2) car = var, cdr = var and get the car = 1 and cdr = 2 after the function have been handled. What is return values, well this is forcing things to be only return values. In guile-log we do this by e.g., 1. put ( x y z) in tail position in e.g. function f. 2. "call" a function f with ( (x y z) (f a b c)) note how we cannot peak bindings to x y z in function f. This shows that we are limiting ourself on the ability of making a 'equational' like logic. To note here is that x y z will be just normal scheme variables that can be used further down the logic chain. When is this logic preferable? Well it's possible to make parsers using logic programming and make use of their backtracking features. It will be slower to do so then good 'ol targeted tools but the toolbox that comes with guile-log means that advanced stuff can be pretty simple to do. To exemplify, let's try to use guile-log to deduce a little framework for parsing and tokenizing. Here we go (DISCLAIMER NOT DEBUGGED CODE). ------------------------------------------------ (use-modules (logic guile-log)) (use-modules ((ice-9 match) #:renamer (prepend-tag 'ice-9-'))) ;; this one catches the common pattern of matching and gives ;; some syntactic sugare, please read on. (define-syntax-rules (pmatch (x xl c n m) (p1 p2 code ...) ...) ( (#:mode +) (x c) (p1 p2 ( ( code ...))) ... (_ _ ( )))) ;; read on (define-syntax-rules (plambda (x xl c n m) . l) ( (x xl c n m) (pmatch (x xl c n m) . l))) ;; we need some notion of always fail and always success (define p-false ( x )) (define p-true ( (x xl c n m) ( x xl c n m))) ;; se how we have the ice-9 matcher framework tooled to have prolog ;; like lambdas. the fuction signature (x xl c n m) are standard and ;; will consist of ;; x = the stream of characters for one line (list) ;; xl = list of lines e.g. xl = (list x1 x2 ...) ;; c = this is the imprint of the result. ;; n = col number ;; m = line number (define (p-test f) (plambda (x xl c n m) (((? f ch) . l) (ch . cc) ( x l cc (+ n 1) m)))) #| The pattern is (list-stream-pat result-pat code ...) Typically c is the incomming inprint ans id just a unbound variable, unifying against (ch . cc) means that cc is bound to the cdr a unbound varaible and and the car of c will havd the ch inprint, just as in prolog. list-stream-pat = (? f ch) - we check if f is true on ch result-pat = (ch . cc) - inprint cc new unbound variable and we just return the values with ( x l cc (+ n 1) m)))) ; note that we add one col number. |# ;; Lets define a ideom that generat a "matcher" for a char. ;; This shows how we use p-test (define (p-char ch) (p-test (lambda (x) (eq? x ch)))) ;; a sequence of matchers can be defined using e.g (p-seq f1 f2 f3 f4 ...) (define (p-seq . l) (ice-match l (() p-true ;; zero always matches ((f1) f1) ;; one matcher is just one matcher ((f1 . l) (let ((f2 (apply p-seq l))) ;; f2 is the sequence of the rest of ;; the matchers ;; The matcher is: ( (x xl c n m) (values (x xl c n m) (f1 x xl c n m)) ;; First call f1, get return (f2 x xl c n m)))))) ;; Then scan the rest. ;; ok, lets assume that we want a and b and c to match the same prefix ;; and that we return the last prefix, here is the code for that (define (p-and . l) (ice-match l (() p-true ((f) f) ((f . l) (let ((f2 (apply p-and l))) ( (x xl c n m) ( (f x xl c n m) (f2 x xl c n m))))))) ;; This will try to match a if that fails b the one that successes ;; return their values. (define (p-or . l) (ice-match l (() p-false ((f) f) ((f . l) (let ((f2 (apply p-or l))) ( (x xl c n m) ( (f x xl c n m) (f2 x xl c n m))))))) #| Now we have made a small toolbox of basic logic for combining matcher, you see the pattern and we could add interleaving versions and so on of and etc, quite easally. Anyhow, people typically write tokenizer and parser separatly because the tokens are w well confined entity and is typically unambighues and can be decided ffrom just local properties. To mimic this separation one can employ the following little tokenizer construct that essentially creates tokens and store them. If we backtrack and reissue the logic the stored value will be used. |# ;; Some global state (define run-n 0) (define run-m 0) (define map #f) ;; We need to reset the state (define (clear-tokens) (set! run-n 0) (set! run-m 0) (set! map (make-has-table))) (clear-tokens) ;; The idea of this function is to perform a tokenizing activity ;; utilizing this means that we loose the ability to redo and undo ;; generally this is not as effective as a regexp tokenizer but should be a ;; much faster then doing a full parser of everything. (define (mk-token f mk) ( (x xl c n m) ;; Check to see if we are in new territory (if (or (> m run-m) (and (= run-m m) (> n run-n))) ;; Virgin land here ( (cnew) ;; Create a fresh variable (values (x xl cc n2 m2) (f x xl cnew n m)) ;; Call the matcher (<=> cc '()) ;; '() marks end ;; of token ( ((val (mk ( cnew)))) ;; get the scheme ;; list and apply mk ( ;; store the new land (set! run-n n2) (set! run-m m2) (hash-set! map (cons n m) (list x xl val n2 m2)) ( () (c) ;; add val to the ((,val . o)) ;; parent res stream ( x xl o n2 m2)))) ( ((v (hash-ref map (cons n m) #f))) ;; retrieve token data ( () (v c) ;; add it to the parent ((x xl val n m) (val . o) ;; res stream ( x xl o n m)))))) ;; zero or more f matches (define (p* f) (p-or (p-seq f (p* f)) p-true)) ;; one or more f matches (define (p+ f) (p-seq f (p* f))) ;; read-mk, used in tokenizer when sheme can translate a token (define (read-mk x) (with-input-from-string (list->string x) read)) ;; Example, create a number token (define p-digit (p-test char-numeric?)) (define p-0 (p-char #\0)) (define p-. (p-char #\.)) (define p-decimal (p-seq (p+ p-digit) (p-or (p-seq p-. (p* p-digit)) p-true))) (define p-exponent (p-seq p-decimal (p-char #\e) (p-or (p-char #\+) (p-char #\-) p-true) p-decimal)) (define p-number (p-or p-exponent p-decimal)) (define tok-number (mk-number p-number read-mk)) ------------------------------------------------------ For any of you, a good exercise is to download kanren and do a similar framework using return values Have fun!