From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.user Subject: schemishes sed Date: Wed, 04 Sep 2013 22:27:33 +0200 Message-ID: <5606497.XAc5KV8WgY@warperdoze> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit X-Trace: ger.gmane.org 1378326480 28196 80.91.229.3 (4 Sep 2013 20:28:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 4 Sep 2013 20:28:00 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Sep 04 22:28:04 2013 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VHJgJ-00079r-Ca for guile-user@m.gmane.org; Wed, 04 Sep 2013 22:28:03 +0200 Original-Received: from localhost ([::1]:55091 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHJgI-00064O-Jr for guile-user@m.gmane.org; Wed, 04 Sep 2013 16:28:02 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39745) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHJg6-00064H-Gg for guile-user@gnu.org; Wed, 04 Sep 2013 16:27:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VHJfz-0007YA-3z for guile-user@gnu.org; Wed, 04 Sep 2013 16:27:50 -0400 Original-Received: from mail-la0-x233.google.com ([2a00:1450:4010:c03::233]:46650) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHJfy-0007VQ-NO for guile-user@gnu.org; Wed, 04 Sep 2013 16:27:43 -0400 Original-Received: by mail-la0-f51.google.com with SMTP id es20so807136lab.38 for ; Wed, 04 Sep 2013 13:27:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:user-agent:mime-version :content-transfer-encoding:content-type; bh=VUMymNPtD74DdtYfb0e7qZchJj9VckwWgKYmfwTe3W0=; b=kVdxjeFx9sTdxOv0yiQs6FSVMG5ZX+hzaBJqkYCtB3cNoZ/Y6ETqmdLmvK0HUG3jMs h9BY1dRLBrQhqTCDQEC+v1QVtB6vwp/ndgyRolCnK2Ew8pXipRDwguVv4DCsDeO1bPKt 5EyGQYbhk0X9AxULocNqwebVxKVAX1f3BF4NQWBM0G8LyTsXofpLQRRoXsqd8bmmGOhP VAcfcAOQikLBp1gGGoZOEYP6pxC/wz5nZJ8Zt3CJTL6+aflAvaL7eKaGIPVizFWeGPFq EltrfZCgOn/1aoXLFyTqzvBz/YWRkLsh/N93FyG8xwC657kGWpvJGqgIt6161m1dQ1A7 8O+w== X-Received: by 10.112.9.195 with SMTP id c3mr3452725lbb.33.1378326461540; Wed, 04 Sep 2013 13:27:41 -0700 (PDT) Original-Received: from warperdoze.localnet (1-1-1-39a.veo.vs.bostream.se. [82.182.254.46]) by mx.google.com with ESMTPSA id ua4sm11388243lbb.17.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 04 Sep 2013 13:27:40 -0700 (PDT) User-Agent: KMail/4.9.5 (Linux/3.5.0-30-generic; KDE/4.9.5; x86_64; ; ) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4010:c03::233 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:10751 Archived-At: Hi all, as I told you in an earlier email, I've been poking with a grep and sed tool that knows about scheme. Now to see where I'm heading just consider the following streamed output. (define (f) (format #t " (let ((x (+ 1 a)) (y 2) (z 3) (w 4)) (do-someting x y z w)) ")) The task is to write a program that change the let to tel and swaps (y 2) to (2 y) and keeping the whitespace reasonable sane. Now we can take on this task by first defining match classes, (define-match-class swap (pattern #(l (x y) r) #:with tr #'#(l (x.l y.it x.r y.l x.it y.r) r))) (define-match-class (tr-it oldval newval) (pattern #(l ,oldval r) #:with tr #`#(l #,(datum->syntax #'a newval) r))) This is similar to syntax-parse define-syntax-class but we allow for ice-9 match semantics with ~and, ~or, ~not, _ ... 'x x ` as usual, there is one extra form (~var x (class a ...)) or (~var x class), which will let x match a syntax class class, just as in syntax-parse. the , match symbol (unquote) will be matched to a variable from the outside context of pattern. Also a variable x will match a token including whitespaces (whitespaces are greedily matched and can include comments #; is treated like a token in itself and we will work on it just as with normal scheme. one can then use x.l x.r x.it as x.l beeing ws to the left, x.r ws to the right and x.it the actual token. Also in the incomming stream each token is bound to a vector with #(l it r) and one can use it directly to match ws when e.g. the it is a constant and not a variable. It is possible to use (~and x 3.14) as well in the matcher. When we assemble the result that should be inserted to the stream one does not need to again use vectors, but vector will work as can be seen in the tr-it class. So swap will swap x and y preserving whitespace. tr-it will translate an oldval to newval. Now to actually do the transorm we can do it by issuing, (define (test) (par-sed (scm-sed (#(l ((~var let (tr-it 'let 'tel)) #(a ((~var bind swap) ...) b) body ...) r) #'#(l (let.tr #(a (bind.tr ...) b) body ...) r))) (f))) And get scheme@(guile-user)> (test) (tel (((+ 1 a) x) (2 y) (3 z) (4 w)) (do-someting x y z w)) Nice! To note here is that what remains is to bind a Self procedure to be able to do recursive translations of y and body ... . That's on the current todo. Also note how we made the evaluation composable e.g. scm-sed producer a matcher that if match printd the result else fails par-sed take a matcher, std-output generating function and perhaps a few flags and then This allows one to reuse scm-sed as an argument to a grepper when we only want to see the matched results e.g. (par-grep (s-seq (scm-sed (pat c ...) ...) print-nl) (f)) This will actually output the old and the new matched string. So the tools are quite an interesting combination of syntax-parse and ice-9 match, it is quite fast because it will only translate and create objects when there is a matche so it works by actually use a matcher of the form, (s-and silent-match (s-seq capture-sexp do-the-reanslation)) As you see the silent match does almost no consing appart from closure creations and should be lightweight. Also the sielent matcher is using a backtracker tuned to not not explode on you so should be quite ok. It does enough cut's to not blow the stack or memory and any prolog variables are reclaimed properly e.g. it should be able to handle large files if no bugs remains in this respect. the cpaturing sexp is using syntax-parse which can be seen of the outputted code for the matcher e.g. (lambda (a b cc) (let ((m (f-or! (s-parens (f-seq (tr-it-match 'let 'tel) (s-parens (f-seq (f* swap-match))) (f* (sexp)))))) (l ( (c) (.. (c) ((sexp! a b cc) c)) ( (sed-print ((lambda (x) (syntax-parse x (#(l ((~var let (tr-it-class 'let 'tel)) #(a ((~var bind swap-class) ...) b) (~var body Sexp) ...) r) (syntax #(l (let.tr #(a (bind.tr ...) b) body ...) r))))) c))) ( 'ok)))) (f-and m l))) Hence it is possible to add extra checks to restrict the match further than the silent matcher. Also on the list is to add possibilities to stop the sed process and actually interact with the current sielent match e.g. one might want to change the output matcher, see if it matches one might want to edit the outputed code for whitespaces or simply check to see why it fails be getting a traced output. anything is possible and could be a cool further endavour. Anyway I stall now for this time and head on to try out other languages. Cheers!