Unbound variables used within a form

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* Unbound variables used within a form
@ 2012-10-13  0:12 Panicz Maciej Godek
  2012-10-13  5:42 ` Mark H Weaver
  2012-10-13 12:54 ` Ludovic Courtès
  0 siblings, 2 replies; 4+ messages in thread
From: Panicz Maciej Godek @ 2012-10-13  0:12 UTC (permalink / raw)
  To: guile-user

Hey,
I just wrote the biggest function in my life :)
It takes a scheme program and returns the list
of all variables (symbols) that are used but
unbound within that form. The code is pretty
straightforward:

(use-modules (srfi srfi-1)(ice-9 match)(srfi srfi-11))

(define (properize src . dst)
"Change an improper list into a proper list"
  (cond ((pair? src)
	 (apply properize (cdr src) (cons (car src) dst)))
	((null? src)
	 (reverse dst))
	(else
	 (reverse (cons src dst)))))

(define (used-variables form)
  (define (diff . args) (apply lset-difference equal? args))
  (define (join . args) (apply lset-union equal? args))
  (define (bound-variables bindings)
    (let-values (((names forms) (unzip2 bindings)))
      (values names (append-map used-variables forms))))
  (define (argument-name arg)
    (cond ((symbol? arg) arg)
	  ((pair? arg) (car arg))
	  (else #f)))
  (match form
    (((or 'let 'letrec 'letrec*) (bindings ...) body ...)
     (let-values (((shadowed used) (bound-variables bindings)))
       (join used (diff (append-map used-variables body) (diff
shadowed used)))))
    (('let (? symbol? name) (bindings ...) body ...)
     (let-values (((shadowed used) (bound-variables bindings)))
       (join used (diff (append-map used-variables body) (diff
shadowed used) (list name)))))
    (('begin body ...)
     (append-map used-variables body))
    (((or 'lambda 'lambda*) arg body ...)
     (cond
      ((or (pair? arg) (list? arg))
       (diff (append-map used-variables body) (filter-map
argument-name (properize arg))))
      ((symbol? arg)
       (diff (append-map used-variables body) (list arg)))))
    (((or 'define 'define*) (name ...) body ...)
     (diff (append-map used-variables body) name))
    (('define name value)
     (diff (used-variables value) name))
    (((or 'case-lambda 'case-lambda*) def ...)
     (apply join (map (match-lambda ((arg body)
				     (cond
				      ((symbol? arg)
				       (diff (append-map used-variables body) (list arg)))
				      ((or (pair? arg) (list? arg))
				       (diff (append-map used-variables body)
					     (filter-map argument-name (properize arg)))))))
		      def)))
    (((or 'if 'or 'and) expr ...)
     (append-map used-variables expr))
    (('quote data)
     '())
    (('quasiquote data)
     (letrec ((quasiquote-variables (match-lambda
				     (('unquote data) (used-variables data))
				     ((data ...) (append-map quasiquote-variables data))
				     (else '()))))
       (quasiquote-variables data)))
    (('@@ name ...)
     '())
    ((procedure ...)
     (append-map used-variables procedure))
    ((? symbol? variable)
       (list variable))
    (else
     '())))

At least for simple cases it seems to work alright.
It has been written specifically for guile, so it took into
account various primitive language constructs that
extend RnRS (like case-lambda or lambda*).

Obviously, feel free to use the code any way you like.
I just wanted to ask:
- beside lambda*, case-lambda, case-lambda*,
are there any additional extensions that guile introduced
atop the Report that should be taken into account in this code?
- is there any simpler way to achieve the same effect, ie.
to acquire all external symbols that are meaningful within
a form (assuming the core semantics, that is, that all
the macros were expanded)

Yours,
M



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unbound variables used within a form
  2012-10-13  0:12 Unbound variables used within a form Panicz Maciej Godek
@ 2012-10-13  5:42 ` Mark H Weaver
  2012-10-13 11:40   ` Panicz Maciej Godek
  2012-10-13 12:54 ` Ludovic Courtès
  1 sibling, 1 reply; 4+ messages in thread
From: Mark H Weaver @ 2012-10-13  5:42 UTC (permalink / raw)
  To: Panicz Maciej Godek; +Cc: guile-user

Panicz Maciej Godek <godek.maciek@gmail.com> writes:
> I just wrote the biggest function in my life :)
> It takes a scheme program and returns the list
> of all variables (symbols) that are used but
> unbound within that form.

FYI, the standard term for these are "free variables".

> I just wanted to ask:
> - beside lambda*, case-lambda, case-lambda*,
> are there any additional extensions that guile introduced
> atop the Report that should be taken into account in this code?

You also need to handle 'with-fluids'.

> - is there any simpler way to achieve the same effect, ie.
> to acquire all external symbols that are meaningful within
> a form (assuming the core semantics, that is, that all
> the macros were expanded)

First of all, it would be much simpler if you traversed the tree-il
directly, instead of decompiling the tree-il to scheme.  Take a look at
module/language/tree-il.scm and module/language/tree-il/*.scm for
details and examples of how to traverse and analyze tree-il.

In tree-il, you wouldn't have to recognize bound variables and remove
them from the list.  The macro expander already does that job, and
specifically marks each variable reference by its type:
(1) <toplevel-ref> or <toplevel-set> for top-level variables,
(2) <module-ref> or <module-set> for variables in a different module,
(3) <lexical-ref> or <lexical-set> for non-toplevel variables, and
(4) <primitive-ref> for selected primitives.  You could simply scan for
these, discarding the lexicals.

If you still prefer to work with scheme sexps, then there are some
options you can pass to the tree-il->scheme decompiler that will make
your job easier.

(define* (my-expand form #:optional (module (current-module)))
  (decompile (compile form
                      #:from 'scheme #:to 'tree-il
                      #:opts '()
                      #:env module)
             #:from 'tree-il #:to 'scheme
             #:opts '(#:use-derived-syntax? #f
                      #:avoid-lambda? #f)))

By default, the decompiler attempts to reconstruct certain common macros
such as 'cond', 'case', 'and', 'or', 'let*', and named-let, to make the
code more readable by humans.  The "#:use-derived-syntax? #f" option
turns off this behavior, which will simplify code analysis tools.
Further, "#:avoid-lambda? #f" forces every procedure to use 'lambda',
'lambda*', 'case-lambda', or 'case-lambda*', so you won't have to worry
about 'define*', and all 'define' forms will have a simple symbol as the
first operand.

In combination, these two options will allow you to eliminate the
following cases from your code:

>   (match form
>     (((or 'let 'letrec 'letrec*) (bindings ...) body ...)
>      (let-values (((shadowed used) (bound-variables bindings)))
>        (join used (diff (append-map used-variables body) (diff
> shadowed used)))))

If not for "#:use-derived-syntax? #f", you would have had to consider
'let*' here.

>     (('let (? symbol? name) (bindings ...) body ...)
>      (let-values (((shadowed used) (bound-variables bindings)))
>        (join used (diff (append-map used-variables body) (diff
> shadowed used) (list name)))))

You won't need to handle named-let.

>     (('begin body ...)
>      (append-map used-variables body))
>     (((or 'lambda 'lambda*) arg body ...)
>      (cond
>       ((or (pair? arg) (list? arg))
>        (diff (append-map used-variables body) (filter-map
> argument-name (properize arg))))
>       ((symbol? arg)
>        (diff (append-map used-variables body) (list arg)))))
>     (((or 'define 'define*) (name ...) body ...)
>      (diff (append-map used-variables body) name))

You won't need to handle this case above.

>     (('define name value)
>      (diff (used-variables value) name))

This case will only happen at the top-level, and 'name' should not be
removed from the list here.

>     (((or 'case-lambda 'case-lambda*) def ...)
>      (apply join (map (match-lambda ((arg body)
> 				     (cond
> 				      ((symbol? arg)
> 				       (diff (append-map used-variables body) (list arg)))
> 				      ((or (pair? arg) (list? arg))
> 				       (diff (append-map used-variables body)
> 					     (filter-map argument-name (properize arg)))))))
> 		      def)))
>     (((or 'if 'or 'and) expr ...)
>      (append-map used-variables expr))

You don't need to handle 'or' or 'and'.

>     (('quote data)
>      '())
>     (('quasiquote data)
>      (letrec ((quasiquote-variables (match-lambda
> 				     (('unquote data) (used-variables data))
> 				     ((data ...) (append-map quasiquote-variables data))
> 				     (else '()))))
>        (quasiquote-variables data)))

You don't need to handle 'quasiquote', which is just a macro.

>     (('@@ name ...)
>      '())

You need to handle both '@@' and '@', and you cannot simply ignore them.

>     ((procedure ...)
>      (append-map used-variables procedure))
>     ((? symbol? variable)
>        (list variable))
>     (else
>      '())))

Having said all this, I think that you are taking very much the wrong
approach to how to save the code in your GUI development system.  Any
programming system needs to work with the *source* code, which means
"the preferred form for making changes".  In Scheme, that means the
source code before *any* compilation has taken place, even macro
expansion.

If you are working with code after macro expansion has taken place, then
you've already lost many of the key advantages of Scheme.  For example,
macro expansion turns a simple record type definition into a relatively
complex set of procedure definitions, which includes internal
implementation details that may change in future versions of Guile.

For another example, take a close look at module/ice-9/psyntax.scm, and
then look at module/ice-9/psyntax-pp.scm, which is the same file after
macro expansion.  Despite my best efforts to make it somewhat readable,
it is most definitely *not* the preferred form for making changes.

I would strongly encourage you to adopt a model where your development
system maintains and manipulates its own copy of the user's source code,
rather than trying to reconstruct the code from Guile's internal data
structures.  That way lies madness.

     Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unbound variables used within a form
  2012-10-13  5:42 ` Mark H Weaver
@ 2012-10-13 11:40   ` Panicz Maciej Godek
  0 siblings, 0 replies; 4+ messages in thread
From: Panicz Maciej Godek @ 2012-10-13 11:40 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

2012/10/13 Mark H Weaver <mhw@netris.org>:

> FYI, the standard term for these are "free variables"

I thought about this, but then I thought that it could
cause some confusion, as any variable that isn't bound
in a formula can be considered free (even the one that
does not appear in the formula) -- at least that's what
I recall from my course in logic. But if there's no actual
ambiguity, I can use that term, of course :)

>> I just wanted to ask:
>> - beside lambda*, case-lambda, case-lambda*,
>> are there any additional extensions that guile introduced
>> atop the Report that should be taken into account in this code?
>
> You also need to handle 'with-fluids'.

Thanks, I will take it into account :)

>> - is there any simpler way to achieve the same effect, ie.
>> to acquire all external symbols that are meaningful within
>> a form (assuming the core semantics, that is, that all
>> the macros were expanded)
>
> First of all, it would be much simpler if you traversed the tree-il
> directly, instead of decompiling the tree-il to scheme.  Take a look at
> module/language/tree-il.scm and module/language/tree-il/*.scm for
> details and examples of how to traverse and analyze tree-il.
>
> In tree-il, you wouldn't have to recognize bound variables and remove
> them from the list.  The macro expander already does that job, and
> specifically marks each variable reference by its type:
> (1) <toplevel-ref> or <toplevel-set> for top-level variables,
> (2) <module-ref> or <module-set> for variables in a different module,
> (3) <lexical-ref> or <lexical-set> for non-toplevel variables, and
> (4) <primitive-ref> for selected primitives.  You could simply scan for
> these, discarding the lexicals.

Thanks a lot! I think that this is what I've been looking for.
(It's horrible when the same functionality is implemented twice for
the same program)

> If you still prefer to work with scheme sexps, then there are some
> options you can pass to the tree-il->scheme decompiler that will make
> your job easier.
[...]
> In combination, these two options will allow you to eliminate the
> following cases from your code:

>>     (('@@ name ...)
>>      '())
>
> You need to handle both '@@' and '@', and you cannot simply ignore them.

I just thought about a silly optimization, but I guess you're right :]
(I mean, in my context it seems to work, but I see that it generally
isn't the case)

> Having said all this, I think that you are taking very much the wrong
> approach to how to save the code in your GUI development system.  Any
> programming system needs to work with the *source* code, which means
> "the preferred form for making changes".  In Scheme, that means the
> source code before *any* compilation has taken place, even macro
> expansion.
>
> If you are working with code after macro expansion has taken place, then
> you've already lost many of the key advantages of Scheme.  For example,
> macro expansion turns a simple record type definition into a relatively
> complex set of procedure definitions, which includes internal
> implementation details that may change in future versions of Guile.
[...]
> I would strongly encourage you to adopt a model where your development
> system maintains and manipulates its own copy of the user's source code,
> rather than trying to reconstruct the code from Guile's internal data
> structures.  That way lies madness.

Actually, I am taking a hybrid approach. The above function is needed
to elliminate unnecessary variables that are introduced to the lexical
context on function source re-generation. The source stored in the
procedure property list is left unexpanded, and therefore
human-readable.

The concept of the system I'm working on is still unclear, even to me
-- I only have a few rough ideas. In a way it's meant to be minimal,
as I don't want to devote too much time to create a big system that
would be thrown to trash afterwards. My main development decision is
to avoid making development decisions, or at least to put them off for
as long as I can. Everything can be redesigned eventually, once the
proper idea of what I am doing is found. The sollution I took has two
main advantages: it (apparently) works and is interesting to
implement.
And speaking of madness, isn't lisp programming about coming up with
clever ways to misuse a computer? ;]

Thank you for a very informative response! :)
M



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unbound variables used within a form
  2012-10-13  0:12 Unbound variables used within a form Panicz Maciej Godek
  2012-10-13  5:42 ` Mark H Weaver
@ 2012-10-13 12:54 ` Ludovic Courtès
  1 sibling, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2012-10-13 12:54 UTC (permalink / raw)
  To: guile-user

Hi!

Panicz Maciej Godek <godek.maciek@gmail.com> skribis:

> It takes a scheme program and returns the list
> of all variables (symbols) that are used but
> unbound within that form.

Just to complete Mark’s response, you might want to look at the
‘unbound-variable’ analysis in the compiler, namely in
language/tree-il/analyze.scm (it implements the compiler warning for
uses of possibly unbound variable.)

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-10-13 12:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-13  0:12 Unbound variables used within a form Panicz Maciej Godek
2012-10-13  5:42 ` Mark H Weaver
2012-10-13 11:40   ` Panicz Maciej Godek
2012-10-13 12:54 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).