Hi,

The patch below (against HEAD) is a proposal to "improve" the module
system in several ways:

  1. Remove inconsistencies in how it behaves.

  2. Get better documentation and test coverage.

  3. Improve performance.

(1) has to do mainly with `module-use!' vs. `module-use-interfaces!' (as
was discussed recently).  Namely the fact that duplicate processing is
not always performed, depending on whether one uses `module-use!' or
some other means to use a module.  The patch solves this issue by making
duplicate processing inescapable.  Likewise, variable lookup currently
has two implementations (which have the same behavior, though): the C
`module_variable ()' and the Scheme `module-variable'.  The patch leaves
only one implementation of that.

There's still more to do to achieve (2) (notably actual documentation
;-)) but it's getting better.  Hopefully `modules.test' could eventually
cover enough of the API to serve as a "documentation".

(3) is two-fold:

  3.a. Duplicate processing.

  3.b. Variable lookup.

Although duplicates should be the exception rather than the rule[*],
duplicate processing is pretty costly: the current `process-duplicates'
is roughly O(N*USES), where N is the number of bindings in the interface
to be imported and USES is the number of modules currently used by the
module (because `module-import-interface' is O(USES)).
`module-use-interfaces!' is also terribly costly (calculating its
complexity is left as an exercise to the reader ;-)).  Likewise,
variable lookup (e.g., in `module_variable ()') is O(USES).  I believe
that both may have a sensible impact on startup time.

The patch addresses this by changing the data structures used by
modules: instead of a list of used modules, it uses a second "obarray",
called the "import obarray", that maps symbols to the modules providing
them.  This makes duplicates processing O(N) where N is the number of
bindings in the module to be imported, and variable lookup time is
independent of the number of modules imported.  The import obarray is
populated when `module-use!' is invoked (e.g., when `define-module' or
`use-modules' is processed).  Because of this, autoloading can no longer
be implemented using `make-autoload-interface' (otherwise, modules would
get loaded immediately, during `process-duplicates'): instead, the new
`module-autoload!' modifies the binder of the user module.

The module system allows bindings to be added dynamically to a module
(e.g., with `module-define!') in such a way that the newly added binding
is immediately visible to the module users.  In order to retain these
semantics, modules in the patched version have to "observe" the modules
they use in order to update their "import obarray" upon modification of
the used modules.  This is achieved using weak observers where the
observer procedure invokes `process-duplicates' when a used module is
changed.

This has several implications.  First, duplicate processing occurs the
same way for dynamically added bindings than for "statically imported"
bindings.  Second, it makes load-time-dependent duplicate policies such
as `last' and `first' irrelevant (since they are inherently
non-deterministic).  Imagine a module that loads `srfi-34' (after
THE-SCM-MODULE) and then update its import obarray as a result of a
modification in THE-SCM-MODULE: the update will replace the previous
value of `raise' (that from `srfi-34') with the core binding for
`raise'.  Third, it makes dynamic addition of bindings relatively
costly.  For instance, adding bindings at run-time to THE-SCM-MODULE can
yield to the duplicate processing all already loaded modules.

GOOPS makes use of `module-define!' after `(oop goops)' is used by the
various GOOPS modules, specifically in `create_smob_classes ()' so that
`(oop goops)' exports classes for all SMOB types (`<module>', etc.).  In
order to work around this problem, the patch modifies GOOPS so that (1)
`(oop goops)' exports only a predefined set of SMOB classes, and (2) the
SMOB classes are added to a separate module called `(oop goops
smob-classes)'.  Since only `(oop goops)' uses it, it is the only one
that needs to re-process duplicates as new SMOB classes are added.


>From a performance viewpoint, the improvement yielded by the new
`process-duplicates' is significant.  It can be observed by
(synthetically) creating a new module, having it import hundreds of
modules with tens of bindings, and then invoking:

  (module-use-interfaces! m (list the-modules-to-import))

(`module-use-interfaces!' already invokes `process-duplicates' in
current Guile.)  From the measurements I've made, the new version is
around 40 times faster than the other one.

The change in variable lookup time can be measured using the worst case,
namely by looking up variables that do not exist in the module---this is
arguably unfair to the current module implementation.  Again, there is a
significant difference between both implementations (since the patched
version is almost instantaneous):

  (module-ref a-module-that-imports-lots-of-modules (gensym))

However, the module construction cost is much higher with the new data
structure since `beautify-user-module!' has to populate the user
module's import obarray instead of just appending a module to its uses
list.  This is optimized by caching a standard module import obarray (in
`%scm-import-obarray') and then simply copying it in
`beautify-user-module!', using the new `hash-table-copy' primitive.
Without `hash-table-copy', the new `beautify-user-module!' is more than
200 times slower than the old one.  With `hash-table-copy', it is "only"
100 times slower.

The tiny script at [0] contains tools and instructions to reproduce
these measurements.


So the question is: is the `beautify-user-module!' overhead compensated
by the variable lookup and duplicate processing gains?

An application of mine [1], although it modifies `the-scm-module' at
run-time, requiring 40 modules to re-process duplicates, has its
execution time reduced by 8% (on a run that loads around 100 modules).
The whole test suite runs about 10% faster with the modified version
(although it has a larger `modules.test').  So it seems to be beneficial
performance-wise.  I'd be happy if people could try it out with other
applications (e.g., Lilypond ;-)) and measure the difference it makes.


Algorithmically, the module system could be further optimized by
removing the use list computation from `module-use!' (the use list is
used by `cond-expand', but `module-uses' could be implemented by
traversing the module obarray).  It could also be "micro-optimized" by
removing the "eval closure" indirection since it does not seem to be
useful.

I hope this long email will lead to a warm discussion!  :-)

Thanks,
Ludovic.

PS: The patch is still drafty.


[*] R6RS libraries _disallow_ duplicate binding imports:
    http://www.r6rs.org/document/html/r6rs-Z-H-2.html#node_toc_node_sec_6.1

[0] http://www.laas.fr/~lcourtes/software/guile/module-duplicates.scm
[1] http://www.nongnu.org/skribilo/