A Plan for Hacking

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* A Plan for Hacking
@ 2011-09-24 23:58 Noah Lavine
  2011-09-25 20:50 ` Ludovic Courtès
  2012-01-04 18:29 ` Andy Wingo
  0 siblings, 2 replies; 9+ messages in thread
From: Noah Lavine @ 2011-09-24 23:58 UTC (permalink / raw)
  To: guile-devel

Hello all,

It looks to me like the PEG project is wrapping up. We're talking
about what to name things now, which means the library is likely
pretty close to inclusion in Guile. After this I will want another
Guile project to work on. My original intention was to work on a JIT
compiler for Guile, but after following the list for a while, it seems
like work on an ahead-of-time compiler might be more useful in the
long term. This email is about my idea for what I will spend time on
over the next several months. I'd like to talk about it beforehand to
make sure that what I do will be useful, and so that I can coordinate
with other people. This is especially important because I think we
should use an unusual architecture for our compiler.

I think the next step is to write a static analyzer. This analyzer
will be used by the compiler, but it will not just be for the compiler
- it will also be available to users. I think of it as a tool for
writing correct programs. Here are some use-cases for a static
analyzer:
  - You write a function that assumes its arguments are of a certain
type. You'd like to be sure this is true, so your program won't throw
exceptions in the middle.
  - You write a program with more than one thread, which includes a
series of mutexes. You'd like to verify that if a thread grabs more
than one of these mutexes, then it always does so in a certain order.
  - You write a program that will respond to HTTP requests. You'd like
to verify that information from a certain set of variables only
influences your response packets if the user has a certain cookie
(i.e. verify that your web server is secure).
  - You write some C code that manipulates Guile values. You'd like to
be sure that the code does all of the appropriate Guile typechecks
before using Guile library functions. (This is taken from the Guile
to-do list, on the website.)
  - You write a compiler for Guile code. You'd like to infer the types
of variables at compile-time.

I also have three design considerations that I think are important for
an analyzer like this:
  - For any given proposition to be checked, it should say either
"yes, this is true", "not true, and here's how it might fail", or "I
can't tell if this is true or false". We cannot omit the last
condition, because an analyzer without that option that can satisfy
all of these use-cases is impossible (Godel's theorem).
  - It should work for multiple languages. This is necessary to do the
C typechecks from the last use case, but I also hope it will be
expanded to more languages as part of Guile integrating its compiler
with GCC someday.
  - It should integrate nicely with all languages it supports. If it
isn't easy to use, people will be less likely to use it.

I said we should use an unusual compiler architecture because I think
that the static analyzer, which is normally part of the compiler,
should be thought of as its own component, which can be used
independently of the compiler. I hope the use-cases above have
convinced you that having an analyzer around could be a useful thing,
but if not, I'm happy to discuss it more. (I will also make an
argument that a static analyzer should be part of the interface to the
compiler, if you are interested.)

So this is my proposal for the next several months: I work on a static
analyzer for Guile, hoping to expand it to other languages as well.
The main goal will be to make it very easy to use this analyzer in
code, including providing helpful feedback to users when their
assertions fail. While developing this, I will make sure that it can
check things that would be useful to a compiler. Once we have a static
analyzer, we can use it as the first step in an optimizing compiler.

What do you think?
Noah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-24 23:58 A Plan for Hacking Noah Lavine
@ 2011-09-25 20:50 ` Ludovic Courtès
  2011-09-25 22:33   ` Noah Lavine
  2012-01-04 18:29 ` Andy Wingo
  1 sibling, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2011-09-25 20:50 UTC (permalink / raw)
  To: guile-devel

Hi Noah!

Noah Lavine <noah.b.lavine@gmail.com> skribis:

> I think the next step is to write a static analyzer. This analyzer
> will be used by the compiler, but it will not just be for the compiler
> - it will also be available to users. I think of it as a tool for
> writing correct programs. Here are some use-cases for a static
> analyzer:
>   - You write a function that assumes its arguments are of a certain
> type. You'd like to be sure this is true, so your program won't throw
> exceptions in the middle.

That would be cool.  However, I suspect that for best results you’d want
procedures to have type annotations, as in Racket’s Typed Scheme and
Bigloo; otherwise it may be impossible to determine type info most of
the time—e.g., a public top-level procedure used in another module.

(It’s not clear to me whether this can be done without seriously
compromising on dynamic module composition.)

>   - You write a program with more than one thread, which includes a
> series of mutexes. You'd like to verify that if a thread grabs more
> than one of these mutexes, then it always does so in a certain order.
>   - You write a program that will respond to HTTP requests. You'd like
> to verify that information from a certain set of variables only
> influences your response packets if the user has a certain cookie
> (i.e. verify that your web server is secure).
>   - You write some C code that manipulates Guile values. You'd like to
> be sure that the code does all of the appropriate Guile typechecks
> before using Guile library functions. (This is taken from the Guile
> to-do list, on the website.)

These three items seem “off-topic” and less important to me.

>   - You write a compiler for Guile code. You'd like to infer the types
> of variables at compile-time.

Cool!

[...]

> So this is my proposal for the next several months: I work on a static
> analyzer for Guile, hoping to expand it to other languages as well.
> The main goal will be to make it very easy to use this analyzer in
> code, including providing helpful feedback to users when their
> assertions fail. While developing this, I will make sure that it can
> check things that would be useful to a compiler. Once we have a static
> analyzer, we can use it as the first step in an optimizing compiler.
>
> What do you think?

Sounds like a nice plan.  :-)

To reduce frustration, I would narrow the use cases a bit, or define
short-term milestones.  For instance, you could decide to focus solely
on type inference, or have type inference one of your major milestones.

Kanren [0] and its companion book look like nice starting points to me.
In particular, there’s a simple type inference engine written in Kanren
that is actually usable with Guile 2.0 out of the box and could serve as
a starting point.

And before that, make sure PEG actually gets merged.  ;-)

Thanks,
Ludo’.

[0] http://kanren.sourceforge.net/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-25 20:50 ` Ludovic Courtès
@ 2011-09-25 22:33   ` Noah Lavine
  2011-09-26  9:11     ` Marijn
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Noah Lavine @ 2011-09-25 22:33 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hello,

>>   - You write a function that assumes its arguments are of a certain
>> type. You'd like to be sure this is true, so your program won't throw
>> exceptions in the middle.
>
> That would be cool.  However, I suspect that for best results you’d want
> procedures to have type annotations, as in Racket’s Typed Scheme and
> Bigloo; otherwise it may be impossible to determine type info most of
> the time—e.g., a public top-level procedure used in another module.
>
> (It’s not clear to me whether this can be done without seriously
> compromising on dynamic module composition.)

I think it can be done effectively, but I'm not quite sure what you
mean by "dynamic module composition." Do you mean that this might
prevent you from making modules that use arbitrary other modules? Or
that it might not work if you loaded modules at runtime from the REPL?

I agree that we'll want type annotation. But you can also think of
type annotation as a special case of the more general idea of caching
intermediate steps to speed up analysis. After all, in theory you
could always rederive the type information for each function from its
source code every time you needed it. I wonder if this is the only
case of caching that will matter, or if it is worthwhile to just make
a more general caching mechanism.

>> [...]
>
> These three items seem “off-topic” and less important to me.

Okay. I'm not sure what I think about them right now, because I am
very intrigued by the idea of using the analyzer to help correctness,
but you might be right. I will think about it more.

I thought of another use-case, and I wonder if you think this is
on-topic or not:

 - You write a parallel version of map that is fast, but only correct
if its argument has no side-effects. You'd like to use this parallel
map instead of the regular one, but only when it is correct.

> To reduce frustration, I would narrow the use cases a bit, or define
> short-term milestones.  For instance, you could decide to focus solely
> on type inference, or have type inference one of your major milestones.

Yes, type inference is a major thing. As I was thinking about a
compiler, the three things that seemed useful to me were type
inference, determining the outlines of big data structures that were
passed around (which could be type inference, depending on what your
type system is), and figuring out the lifetimes of heap-allocated
values.

> Kanren [0] and its companion book look like nice starting points to me.
> In particular, there’s a simple type inference engine written in Kanren
> that is actually usable with Guile 2.0 out of the box and could serve as
> a starting point.

Great! I will look at those. I have also found a book called Software
Foundations [0], which is an introduction to the Coq theorem prover.
It might be useful to be familiar with a few systems before starting
my own (or perhaps not even starting my own, but using Kanren). Do you
also happen to know of a book on formal logic systems in general, so I
can get more perspective? (I have one, but it doesn't cover as much as
I wish it did.)

I'm also interested in what Stefan Tampe is doing, because I believe
he's implementing a formal logic system in Guile. Stefan, does your
project connect to this one?

> And before that, make sure PEG actually gets merged.  ;-)

Yes, I will make sure that happens. Currently PEG is waiting on an
email about names. Once we agree on those, then I know of nothing
further blocking it from being merged.

Thanks,
Noah

[0] http://www.cis.upenn.edu/~bcpierce/sf/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-25 22:33   ` Noah Lavine
@ 2011-09-26  9:11     ` Marijn
  2011-09-26 11:10       ` William ML Leslie
  2011-09-26 10:31     ` Stefan Israelsson Tampe
  2011-09-28  9:38     ` Ludovic Courtès
  2 siblings, 1 reply; 9+ messages in thread
From: Marijn @ 2011-09-26  9:11 UTC (permalink / raw)
  To: Noah Lavine; +Cc: Ludovic Courtès, guile-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Noah,

On 09/26/11 00:33, Noah Lavine wrote:
> I thought of another use-case, and I wonder if you think this is 
> on-topic or not:
> 
> - You write a parallel version of map that is fast, but only
> correct if its argument has no side-effects. You'd like to use this
> parallel map instead of the regular one, but only when it is
> correct.

This seems to assume that map does its thing one element after
another, but that is not what the spec says it does. Note "The dynamic
order in which proc is applied to the elements of the lists is
unspecified." in the below R5RS quote:

(map proc list1 list2 . . . )

The lists must be lists, and proc must be a procedure taking
as many arguments as there are lists and returning a single
value. If more than one list is given, then they must all
be the same length. Map applies proc element-wise to the
elements of the lists and returns a list of the results, in
order. The dynamic order in which proc is applied to the
elements of the lists is unspecified.

Marijn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6AQaUACgkQp/VmCx0OL2xmcQCfZS87CrelffaXA3Yu3moJwb/b
D/wAnAlE1zb4EOG1bsNG7aiN/v2KJrj3
=Zg5z
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-25 22:33   ` Noah Lavine
  2011-09-26  9:11     ` Marijn
@ 2011-09-26 10:31     ` Stefan Israelsson Tampe
  2011-09-28  9:38     ` Ludovic Courtès
  2 siblings, 0 replies; 9+ messages in thread
From: Stefan Israelsson Tampe @ 2011-09-26 10:31 UTC (permalink / raw)
  To: Noah Lavine; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

> I'm also interested in what Stefan Tampe is doing, because I believehe's
implementing a
> formal logic system in Guile. Stefan, does yourproject connect to this
one?

Yes I'm implementing the leanCop first order logic prover. Now this might
not be the same
as implementing a new Coq system. I don't know the difference. Also there is
a prover example in the kanren framework.

If you need to stick with scheme only use kanren as a base.

You could use my development if you can stand to load in a shared compiled
library. I prefere to have both systems available in a toolbox, kanren has
it's pros and cons and as well as a
more traditional prolog system has it's pros and cons.

For the typechecker. Check out what I'm doing in guile-unify as well. There
a big part of
the features in typed racket implemented. Currently it demands that
everything is typed
but I plan to loosen this as well in order to allow for untyped arguments
e.g. can be anything.

/Stefan

[-- Attachment #2: Type: text/html, Size: 1100 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-26  9:11     ` Marijn
@ 2011-09-26 11:10       ` William ML Leslie
  0 siblings, 0 replies; 9+ messages in thread
From: William ML Leslie @ 2011-09-26 11:10 UTC (permalink / raw)
  To: Marijn; +Cc: Ludovic Courtès, guile-devel

On 26 September 2011 19:11, Marijn <hkBst@gentoo.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Noah,
>
> On 09/26/11 00:33, Noah Lavine wrote:
>> I thought of another use-case, and I wonder if you think this is
>> on-topic or not:
>>
>> - You write a parallel version of map that is fast, but only
>> correct if its argument has no side-effects. You'd like to use this
>> parallel map instead of the regular one, but only when it is
>> correct.
>
> This seems to assume that map does its thing one element after
> another, but that is not what the spec says it does. Note "The dynamic
> order in which proc is applied to the elements of the lists is
> unspecified." in the below R5RS quote:
>
> (map proc list1 list2 . . . )
>
> The lists must be lists, and proc must be a procedure taking
> as many arguments as there are lists and returning a single
> value. If more than one list is given, then they must all
> be the same length. Map applies proc element-wise to the
> elements of the lists and returns a list of the results, in
> order. The dynamic order in which proc is applied to the
> elements of the lists is unspecified.

If you want to reason about the interference of effects, you might
like the original effect typing paper.  They were originally employed
as a means to infer accurate principal type in the presence of
mutability and type parametrism, which might also be useful to you if
you find HM unsuitable, but for reasoning about the /relative purity/
of a call I've found them invaluable.  The paper also touches on
region types, which may be used to remove heap allocation in many
cases, although that application of them is not discussed there.

The Type and Effect Discipline. Jean-Pierre Talpin. Pierre Jouvelot, ENS.

www.cs.ucla.edu/~palsberg/tba/papers/talpin-jouvelot-iandc94.pdf

-- 
William Leslie



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-25 22:33   ` Noah Lavine
  2011-09-26  9:11     ` Marijn
  2011-09-26 10:31     ` Stefan Israelsson Tampe
@ 2011-09-28  9:38     ` Ludovic Courtès
  2011-09-28 18:05       ` Noah Lavine
  2 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2011-09-28  9:38 UTC (permalink / raw)
  To: Noah Lavine; +Cc: guile-devel

Hi Noah,

>>>   - You write a function that assumes its arguments are of a certain
>>> type. You'd like to be sure this is true, so your program won't throw
>>> exceptions in the middle.
>>
>> That would be cool.  However, I suspect that for best results you’d want
>> procedures to have type annotations, as in Racket’s Typed Scheme and
>> Bigloo; otherwise it may be impossible to determine type info most of
>> the time—e.g., a public top-level procedure used in another module.
>>
>> (It’s not clear to me whether this can be done without seriously
>> compromising on dynamic module composition.)
>
> I think it can be done effectively, but I'm not quite sure what you
> mean by "dynamic module composition." Do you mean that this might
> prevent you from making modules that use arbitrary other modules? Or
> that it might not work if you loaded modules at runtime from the REPL?

I was referring to the ability to connect code snippets at run-time (by
means of use-modules, C-x C-e in Geiser, etc.) and have an evolving,
live system.

Having types associated with values makes this easier.  It may become
harder once the compiler has built assumptions into the generated code
about the types each procedure expects.

[...]

>> To reduce frustration, I would narrow the use cases a bit, or define
>> short-term milestones.  For instance, you could decide to focus solely
>> on type inference, or have type inference one of your major milestones.
>
> Yes, type inference is a major thing. As I was thinking about a
> compiler, the three things that seemed useful to me were type
> inference, determining the outlines of big data structures that were
> passed around (which could be type inference, depending on what your
> type system is), and figuring out the lifetimes of heap-allocated
> values.

Yes, that would be good too.

>> Kanren [0] and its companion book look like nice starting points to me.
>> In particular, there’s a simple type inference engine written in Kanren
>> that is actually usable with Guile 2.0 out of the box and could serve as
>> a starting point.
>
> Great! I will look at those. I have also found a book called Software
> Foundations [0], which is an introduction to the Coq theorem prover.
> It might be useful to be familiar with a few systems before starting
> my own (or perhaps not even starting my own, but using Kanren). Do you
> also happen to know of a book on formal logic systems in general, so I
> can get more perspective? (I have one, but it doesn't cover as much as
> I wish it did.)

About Kanren, there’s “The Reasoned Schemer”.

About Coq, I’ve heard that the Coq’Art, by Bertot & Castéran [0] is a
nice read, but it’s heavy.  ;-)

[0] http://www.labri.fr/perso/casteran/CoqArt/index.html

>> And before that, make sure PEG actually gets merged.  ;-)
>
> Yes, I will make sure that happens. Currently PEG is waiting on an
> email about names. Once we agree on those, then I know of nothing
> further blocking it from being merged.

Oh, right.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-28  9:38     ` Ludovic Courtès
@ 2011-09-28 18:05       ` Noah Lavine
  0 siblings, 0 replies; 9+ messages in thread
From: Noah Lavine @ 2011-09-28 18:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hello,

>> I think it can be done effectively, but I'm not quite sure what you
>> mean by "dynamic module composition." Do you mean that this might
>> prevent you from making modules that use arbitrary other modules? Or
>> that it might not work if you loaded modules at runtime from the REPL?
>
> I was referring to the ability to connect code snippets at run-time (by
> means of use-modules, C-x C-e in Geiser, etc.) and have an evolving,
> live system.
>
> Having types associated with values makes this easier.  It may become
> harder once the compiler has built assumptions into the generated code
> about the types each procedure expects.

I agree. I think that for statically generating code, where we want to
do as much optimization as possible, we need something called a
"program definition", which is a list containing all entry points to
the program. Then you could assume that all control flow starts at
those entry points. I also think of this as something that the
compiler will eventually use as well - for instance, if you want to
generate a library, you would provide a list of the functions that
would be entry points. For a standard unix program, there would be
just one entry point. (The big advantage for the compiler is that the
list could also contain ABI information, which is useful for people
who make libraries that need to be ABI-compatible. I can argue for
this more if you like.)

It's not clear to me how well analysis will work in dynamically-edited
code, but I would like to do as much as possible there. I think the
key to that will be a combination of caching generated code, and
invalidating the caches when necessary, and generating specialized
versions of functions that are called by other compiled code but are
separate from the generic entry points to those functions.

> About Kanren, there’s “The Reasoned Schemer”.

I've ordered that book; it should be here soon. My plan is to read
things for the next couple weeks, to make sure I really understand how
an analyzer needs to work, come up with a plan for Guile's, and then
email the list again to discuss it.

Thanks,
Noah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: A Plan for Hacking
  2011-09-24 23:58 A Plan for Hacking Noah Lavine
  2011-09-25 20:50 ` Ludovic Courtès
@ 2012-01-04 18:29 ` Andy Wingo
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Wingo @ 2012-01-04 18:29 UTC (permalink / raw)
  To: Noah Lavine; +Cc: guile-devel

Hi Noah,

I never contributed to this thread, but I've had it marked for months
now, so FWIW...

On Sat 24 Sep 2011 19:58, Noah Lavine <noah.b.lavine@gmail.com> writes:

> So this is my proposal for the next several months: I work on a static
> analyzer for Guile, hoping to expand it to other languages as well.

This sounds cool.  I assume you're familiar with kCFA?  See
http://matt.might.net/articles/implementation-of-kcfa-and-0cfa/, for
example.

It doesn't seem to me that static analysis is a prerequisite for AOT
compilation -- and indeed, the current .go compilation is an example of
naive AOT compilation.

Static analysis can certainly facilitate optimization, especially
whole-program optimization, but it's orthogonal IMO.  Of course it has
other uses as well, and I do think it's an interesting thing to do.

Happy hacking,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-01-04 18:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-24 23:58 A Plan for Hacking Noah Lavine
2011-09-25 20:50 ` Ludovic Courtès
2011-09-25 22:33   ` Noah Lavine
2011-09-26  9:11     ` Marijn
2011-09-26 11:10       ` William ML Leslie
2011-09-26 10:31     ` Stefan Israelsson Tampe
2011-09-28  9:38     ` Ludovic Courtès
2011-09-28 18:05       ` Noah Lavine
2012-01-04 18:29 ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).