unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* schemishes sed
@ 2013-09-04 20:27 Stefan Israelsson Tampe
  0 siblings, 0 replies; only message in thread
From: Stefan Israelsson Tampe @ 2013-09-04 20:27 UTC (permalink / raw)
  To: guile-user

Hi all, as I told you in an earlier email, I've been poking with a
grep and sed tool that knows about scheme. Now to see where I'm
heading just consider the following streamed output.

(define (f)
  (format #t
"

(let ((x (+ 1 a))
      (y 2)
      (z 3) 
      (w 4))
  (do-someting x y z w))


"))

The task is to write a program that change the let to tel and swaps (y
2) to (2 y) and keeping the whitespace reasonable sane. Now we can
take on this task by first defining match classes,

(define-match-class swap
  (pattern #(l (x y) r) 
     #:with tr #'#(l (x.l y.it x.r y.l x.it y.r) r)))

(define-match-class (tr-it oldval newval)
  (pattern #(l ,oldval r)
     #:with tr #`#(l #,(datum->syntax #'a newval) r)))

This is similar to syntax-parse define-syntax-class but we allow for
ice-9 match semantics with ~and, ~or, ~not, _ ... 'x x ` as usual, there
is one extra form (~var x (class a ...)) or (~var x class), which will
let x match a syntax class class, just as in syntax-parse. the ,
match symbol (unquote) will be matched to a variable from the outside
context of pattern. Also a variable x will match a token including 
whitespaces (whitespaces are greedily matched and can include comments
#; is treated like a token in itself and we will work on it just as
with normal scheme. one can then use x.l x.r x.it as x.l beeing ws to
the left, x.r ws to the right and x.it the actual token. Also in the 
incomming stream each token is bound to a vector with #(l it r) and
one can use it directly to match ws when e.g. the it is a constant and
not a variable. It is possible to use (~and x 3.14) as well in the
matcher. When we assemble the result that should be inserted to the
stream one does not need to again use vectors, but vector will work as
can be seen in the tr-it class. 

So swap will swap x and y preserving whitespace. tr-it will translate
an oldval to newval.

Now to actually do the transorm we can do it by issuing,



(define (test)
   (par-sed (scm-sed (#(l ((~var let (tr-it 'let 'tel))
			         #(a ((~var bind swap) ...) b)
			 body ...) r)
		     #'#(l (let.tr #(a (bind.tr ...) b) body ...) r)))
      (f)))

And get

scheme@(guile-user)> (test)



(tel (((+ 1 a) x)
      (2 y)
      (3 z) 
      (4 w))
  (do-someting x y z w))


Nice! To note here is that what remains is to bind a Self procedure to
be able to do recursive translations of y and body ... . That's on the
current todo. Also note how we made the evaluation composable e.g.

  scm-sed producer a matcher that if match printd the result else
          fails 
  par-sed take a matcher, std-output generating function and
          perhaps a few flags and then 

This allows one to reuse scm-sed as an argument to a grepper when we
only want to see the matched results e.g.

(par-grep (s-seq (scm-sed (pat c ...) ...) print-nl) (f))

This will actually output the old and the new matched string.

So the tools are quite an interesting combination of syntax-parse and
ice-9 match, it is quite fast because it will only translate and
create objects when there is a matche so it works by actually use
a matcher of the form,

  (s-and silent-match
         (s-seq capture-sexp do-the-reanslation))

As you see the silent match does almost no consing appart from closure
creations and should be lightweight. Also the sielent matcher is using
a backtracker tuned to not not explode on you so should be quite
ok. It does enough cut's to not blow the stack or memory and any
prolog variables are reclaimed properly e.g. it should be able to
handle large files if no bugs remains in this respect. the cpaturing
sexp is using syntax-parse which can be seen of the outputted code for
the matcher e.g. 

(lambda (a b cc)
  (let ((m (f-or! (s-parens
                    (f-seq (tr-it-match 'let 'tel)
                           (s-parens (f-seq (f* swap-match)))
                           (f* (sexp))))))
        (l (<p-lambda>
             (c)
             (.. (c) ((sexp! a b cc) c))
             (<code>
               (sed-print
                 ((lambda (x)
                    (syntax-parse
                      x
                      (#(l
                         ((~var let (tr-it-class 'let 'tel))
                          #(a ((~var bind swap-class) ...) b)
                          (~var body Sexp)
                          ...)
                         r)
                       (syntax
                         #(l (let.tr #(a (bind.tr ...) b) body ...) 
r)))))
                  c)))
             (<p-cc> 'ok))))
    (f-and m l)))

Hence it is possible to add extra checks to restrict the match further
than the silent matcher. Also on the list is to add possibilities to
stop the sed process and actually interact with the current sielent
match e.g. one might want to change the output matcher, see if it
matches one might want to edit the outputed code for whitespaces or
simply check to see why it fails be getting a traced output. anything
is possible and could be a cool further endavour. Anyway I stall now
for this time and head on to try out other languages.

Cheers!




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-09-04 20:27 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-04 20:27 schemishes sed Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).