unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Need help embedding Guile
@ 2021-12-21 11:12 Dimitris Papavasiliou
  2021-12-21 11:37 ` Maxime Devos
  0 siblings, 1 reply; 16+ messages in thread
From: Dimitris Papavasiliou @ 2021-12-21 11:12 UTC (permalink / raw)
  To: guile-user

Hi all,

I'm in the process of embedding Guile in an application and although I seem to
have the essentials working, I'd appreciate some confirmation of the validity of
my approach and also some tips on a couple of loose ends.

I won't bore you with the specifics of my application; for the purposes of the
discussion, its most important characteristic, is that it uses Guile as a
frontend of sorts.  By this, I mean that a Scheme program is executed, which
creates objects in the C (well actually, and out of necessity, C++) domain.
These objects represent geometric operations, in the form of a graph and are
evaluated once the Scheme program has terminated.  Since the evaluation can take
a long time and the Scheme code itself, in simply creating nodes in the graph,
is expected to run to completion quite quickly, even though it can be
conceptually complex, the emphasis on the Scheme side is on debugability instead
of efficiency.

My aim, is to be able to load a Scheme program from a file, run it to have the
graph created and then clean up.  On error, I'd like to print out diagnostic
information in the form of an error message with as accurate as possible source
location and a stack trace.  (I'd also like to print the latter with my own
formatting to match rest of the output of the application.)

Although perhaps other approaches are possible, I have, for now, chosen to leave
memory management to the C++ side, so that my foreign objects need custom
finalization.  The basic layout of my current implementation, with uninteresting
portions left out, is the following (where `run_scheme' is called by the main
program to run a Scheme script):

struct context {
    char *input, **first, **last;
};

int run_scheme(const char *input, char **first, char **last)
{
    struct context context = {const_cast<char *>(input), first, last};

    scm_with_guile(&run_scheme_from_guile, &context);

    return 0;
}

static void *run_scheme_from_guile(void *data)
{
    struct context *context = static_cast<struct context *>(data);

    scm_set_automatic_finalization_enabled(0);

    // Define some foreign objects types and subroutines.
    // [...]

    scm_set_program_arguments(
        context->last - context->first, context->first, context->input);

    scm_c_catch(SCM_BOOL_T,
                run_body, reinterpret_cast<void *>(context),
                post_handler, nullptr,
                pre_handler, nullptr);

    scm_gc();
    scm_run_finalizers();

    return nullptr;
}

static SCM run_body(void *data)
{
    struct context *context = static_cast<struct context *>(data);

    scm_primitive_eval(
	scm_list_2(
	    scm_from_latin1_symbol("load"),
	    scm_from_latin1_string(context->input)));

    return SCM_UNSPECIFIED;
}

static SCM pre_handler(void *data, SCM key, SCM args)
{
    SCM s = scm_make_stack(SCM_BOOL_T, SCM_EOL);
    SCM p = scm_current_error_port();

    scm_print_exception(p, SCM_BOOL_F, key, args);
    scm_display_backtrace(s, p, SCM_BOOL_F, SCM_BOOL_F);

    return SCM_BOOL_T;
}

static SCM post_handler(void *data, SCM key, SCM args)
{
    return SCM_BOOL_T;
}

Actually, my code in `pre_handler' is not quite what is shown above, as I print
the stack with my own formatting, but let's leave that for later.  As I said,
this seems to be working, but certain points are unclear to me after reading all
the documentation I could find and snooping around in Guile's source code:

1. The manual is not very specific about how and when finalizers are run.  The
   approach above seems to correctly finalize all objects created as the Scheme
   code executes, but if references are kept, say via (define), they are not
   finalized and I get memory leaks.  Is there some way to arrange for the
   complete deinitialization of Guile after I've finished evaluating Scheme code
   and making sure that all finalizers are run?

2. If, in `run_body', I simply do

       scm_c_primitive_load(context->input);

   then the code is evaluated, but on error I get no locations in the stack
   trace.  The error is said to have occurred "in an unknown file" with no line
   numbers.  Evaluating `load' as shown above, seems to produce proper source
   locations in the stack trace.  Is there something else I should be preferably
   doing?

3. More generally, is there a preferable way to go about embedding Guile for my
   use case?

Thanks in advance for any pointers,
Dimitris




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 11:12 Need help embedding Guile Dimitris Papavasiliou
@ 2021-12-21 11:37 ` Maxime Devos
  2021-12-21 13:45   ` Maxime Devos
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Maxime Devos @ 2021-12-21 11:37 UTC (permalink / raw)
  To: Dimitris Papavasiliou, guile-user

Dimitris Papavasiliou schreef op di 21-12-2021 om 11:12 [+0000]:
> [1...]

> . The manual is not very specific about how and when finalizers are
> run.  The
>    approach above seems to correctly finalize all objects created as
> the Scheme
>    code executes, but if references are kept, say via (define), they
> are not
>    finalized and I get memory leaks.  Is there some way to arrange
> for the
>    complete deinitialization of Guile after I've finished evaluating
> Scheme code
>    and making sure that all finalizers are run?

The manual is not very specific on when finalizers are run, because
there aren't many formal guarantees (e.g., BDW-GC is a conservative GC,
so it might think an object is not finalizable even though it is).

About deinitialising guile: I don't know.
About finalizers: No. From the BDW-GC faq:

 I want to ensure that all my objects are finalized and reclaimed
before process exit. How can I do that?

You can't, and you don't really want that. This would require
finalizing reachable objects. Finalizers run later would have to be
able to handle this, and would have to be able to run with randomly
broken libraries, because the objects they rely on where previously
finalized. In most environments, you would also be replacing the
operating systems mechanism for very efficiently reclaiming process
memory at process exit with a significantly slower mechanism.

You do sometimes want to ensure that certain particular resources are
explicitly reclaimed before process exit, whether or not they become
unreachable. Programming techniques for ensuring this are discussed in

``Destructors, Finalizers, and Synchronization'', Proceeedings of the
2003 ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, Jan. 2003, pp. 262-272. Official version. Technical report
version. HTML slides. PDF slides. 

> 2. If, in `run_body', I simply do
> 
>        scm_c_primitive_load(context->input);
> 
>    then the code is evaluated, but on error I get no locations in the
> stack
>    trace.  The error is said to have occurred "in an unknown file"
> with no line
>    numbers.  Evaluating `load' as shown above, seems to produce
> proper source
>    locations in the stack trace.  Is there something else I should be
> preferably
>    doing?

Due to bootstrapping reasons, there are multiple readers and evaluators
in Guile, of varying debugability. I'm not 100% sure, but I think
the 'primitive-load' reader+evaluator has low debugability and the
'load' procedure has higher debugability?

> 3. More generally, is there a preferable way to go about embedding
> Guile for my
>    use case?

Instead of reinitialising and deinitialising guile repeatedly (seems
inefficient!), I would suggest initialising Guile once at program start
and do Guile stuff whenever needed.

However, that might be incompatible with your memory management
approach ...

Greetings,
Maxime.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 11:37 ` Maxime Devos
@ 2021-12-21 13:45   ` Maxime Devos
  2021-12-21 13:48   ` Maxime Devos
  2021-12-21 21:50   ` Dimitris Papavasiliou
  2 siblings, 0 replies; 16+ messages in thread
From: Maxime Devos @ 2021-12-21 13:45 UTC (permalink / raw)
  To: Dimitris Papavasiliou, guile-user

Hi,

Maxime Devos schreef op di 21-12-2021 om 11:37 [+0000]:
> >    approach above seems to correctly finalize all objects created
> > as
> > the Scheme
> >    code executes, but if references are kept, say via (define),
> > they
> > are not
> >    finalized and I get memory leaks.

You can (set! the-global-variable #f) to clear the reference,
though still there are no formal guarantees BDW-GC will collect things.

Greetings,
Maxime




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 11:37 ` Maxime Devos
  2021-12-21 13:45   ` Maxime Devos
@ 2021-12-21 13:48   ` Maxime Devos
  2021-12-21 21:50   ` Dimitris Papavasiliou
  2 siblings, 0 replies; 16+ messages in thread
From: Maxime Devos @ 2021-12-21 13:48 UTC (permalink / raw)
  To: Dimitris Papavasiliou, guile-user

Maxime Devos schreef op di 21-12-2021 om 11:37 [+0000]:
> About finalizers: No. From the BDW-GC faq: [...]

I misread your question; this answer doesn't apply exactly to your
question. However, there still are no formal guaranteed BDW-GC will
collect everything.

Greetings,
Maxime




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 11:37 ` Maxime Devos
  2021-12-21 13:45   ` Maxime Devos
  2021-12-21 13:48   ` Maxime Devos
@ 2021-12-21 21:50   ` Dimitris Papavasiliou
  2021-12-21 22:39     ` Olivier Dion via General Guile related discussions
  2 siblings, 1 reply; 16+ messages in thread
From: Dimitris Papavasiliou @ 2021-12-21 21:50 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user

Hi Maxime,

Many thanks for your response; it was very helpful.  Unfortunately I'm now not
so sure that I have the basics of embedding Guile more or less working and, even
worse, I'm not really sure Guile is meant to work in the way I'm trying to use
it.

The idea is that the C++ program, after some initialization, loads and evaluates
one or more Scheme files (provided by the user as command line arguments).
During the course of their evaluation, these create objects on the C++ side
(representing the work that is to be done) and, once they're evaluated the work
of Guile is done.  At that point, ideally, I'd like to deinitialize/terminate
Guile, both to reclaim resources which are no longer necessary and to ensure
that it plays no further role in the execution of the rest of the program.  As
far as I can see, this is not possible.

Furthermore, as you have pointed out, I cannot ensure that all created foreign
objects are finalized.  The idea here seems to be that some objects might have
still been reachable at the very end, as far as the GC can tell and, at any
rate, will be reclaimed by the operating system when the process exits.  But in
my case, where the role of the embedded language, is restricted to the initial
phase of the embedding program's execution, this not only needlessly removes
control of these resources from the embedding program, they also show up as
memory leaks in tools like MemorySanitizer and Valgrind, which is a big problem
in itself.  Given the inability to tear down/kill Guile explicitly, I can't see
a way around this.

That is not entirely true.  I could perhaps keep track of objects that need to
be finalized myself, and finalize them manually after Scheme code evaluation is
done.  This also seems to be what's recommended in one of the sources you
quoted:

On Tuesday, December 21st, 2021 at 1:37 PM, Maxime Devos wrote:

> You do sometimes want to ensure that certain particular resources are
> explicitly reclaimed before process exit, whether or not they become
> unreachable. Programming techniques for ensuring this are discussed in
> ``Destructors, Finalizers, and Synchronization'', Proceeedings of the
> 2003 ACM SIGPLAN-SIGACT Symposium on Principles of Programming
> Languages, Jan. 2003, pp. 262-272. Official version. Technical report
> version. HTML slides. PDF slides.

I'm still not sure I find this to be a satisfactory approach though.  Not only
is it non-trivial in terms of implementation, it also feels like I'm going
against the current, trying to bend Guile into a role it's not meant to serve.

> You can (set! the-global-variable #f) to clear the reference,
> though still there are no formal guarantees BDW-GC will collect things.

True.  Furthermore, the Scheme code, being user-provided, could establish any
number of references in various ways.  As far as I can see, there is no way to
completely "clear" an environment, i.e. remove all bindings contained in it, or
to somehow delete it altogether or communicate to the GC that is no longer
reachable, so again I see no way around this.

If anyone has any comments or ideas, they would be most welcome.

Dimitris

PS:

> Due to bootstrapping reasons, there are multiple readers and evaluators
> in Guile, of varying debugability. I'm not 100% sure, but I think
> the 'primitive-load' reader+evaluator has low debugability and the
> 'load' procedure has higher debugability?

I'll leave diagnostics for later, perhaps for a different thread, as the matters
discussed above seem to be more serious (and potentially fatal).




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 21:50   ` Dimitris Papavasiliou
@ 2021-12-21 22:39     ` Olivier Dion via General Guile related discussions
  2021-12-22 11:41       ` Dimitris Papavasiliou
  0 siblings, 1 reply; 16+ messages in thread
From: Olivier Dion via General Guile related discussions @ 2021-12-21 22:39 UTC (permalink / raw)
  To: Dimitris Papavasiliou, Maxime Devos; +Cc: guile-user

On Tue, 21 Dec 2021, Dimitris Papavasiliou <dpapavas@protonmail.ch> wrote:

> The idea is that the C++ program, after some initialization, loads and evaluates
> one or more Scheme files (provided by the user as command line arguments).
> During the course of their evaluation, these create objects on the C++ side
> (representing the work that is to be done) and, once they're evaluated the work
> of Guile is done.  At that point, ideally, I'd like to deinitialize/terminate
> Guile, both to reclaim resources which are no longer necessary and to ensure
> that it plays no further role in the execution of the rest of the program.  As
> far as I can see, this is not possible.

From this description, what I understand is that you want to use Scheme
as a configuration file for batching the operations to be done in a
second phase in C++.  However, I failed to see why you need to finalize
these objects since you're going to use them in your second phase?

> If anyone has any comments or ideas, they would be most welcome.

One way I think of would be to fork the process and create your C++
objects in a shared memory area between the parent and the child.  Once
Guile is done reading your inputs, the child process dies and all its
memory is reclaimed by the OS.

> I'd appreciate some confirmation of the validity of my approach and
> also some tips on a couple of loose ends.

I think it's a valid approach.

-- 
Olivier Dion
Polymtl



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-21 22:39     ` Olivier Dion via General Guile related discussions
@ 2021-12-22 11:41       ` Dimitris Papavasiliou
  2021-12-22 13:52         ` Thien-Thi Nguyen
                           ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dimitris Papavasiliou @ 2021-12-22 11:41 UTC (permalink / raw)
  To: Olivier Dion; +Cc: Maxime Devos, guile-user

Hi Olivier,

Thanks for chiming in!

On Wednesday, December 22nd, 2021 at 12:39 AM, Olivier Dion wrote:

> From this description, what I understand is that you want to use Scheme
> as a configuration file for batching the operations to be done in a
> second phase in C++.

From an architectural point of view, the situation is probably similar
to simple configuration parsing, as you say.  In practice though, the
scope of Guile's role won't be nearly as limited as what this might
imply.

In a way, what I'm working on, can be described as a "compiler" for
geometry.  That is to say, it reads in the description of geometry in
some language and then evaluates it and dumps it in a format that can
be used for e.g. 3D printing.  Guile (amongst other options) supplies
the language frontend.  For example, one might invoke it on the shell
like this:

$ compile hollow_cube.scm

Here, `hollow_cube.scm' would contain, say:

(write-stl
  "hollow_cube.stl"
  (difference
    (cube 2 2 2)
    (cube 1 1 1)))

When the `cube' procedure is evaluated, it creates a node in the C++
backend that evaluates to the geometry of a cube.  Similarly
`difference' creates a node, with edges to the two cubes, which when
evaluated computes their boolean difference and so on.  Of course, in
this instance, the Scheme code amounts to little more than what might
be termed a configuration file, but imagine the code to create a
complex mechanical part and the situation is (conceptually) quite
different.

Creating the complete graph before evaluation begins in the second
phase is probably not necessary (nodes could be evaluated as they're
created), but it creates the opportunity for certain optimizations
(like dead code elimination for instance).  This makes some sort of
forcing/ensuring that Guile has terminated desirable.

> However, I failed to see why you need to finalize these objects
> since you're going to use them in your second phase?

Foreign objects currently come in two categories:

1. Complete geometric operations such as `cube' and `difference'
   above.  These are allocated on the C++ side and a so-called "smart
   pointer" (shared_ptr) is exported to Scheme.  Failure to finalize
   this retains a reference on the C++ side, which would prevent
   destroying the operation.  Since these can get quite large in terms
   of memory, destroying them after they're no longer needed can be
   essential.

2. "Complex" arguments to the above operations such as
   transforamtions, planes, points, etc. which are C++ classes, for
   which the destructor needs to be called in one way or the other.
   These are generally passed by value to the operations and can
   therefore be freed once the first phase is complete.

> One way I think of would be to fork the process and create your C++
> objects in a shared memory area between the parent and the child. Once
> Guile is done reading your inputs, the child process dies and all its
> memory is reclaimed by the OS.

I have considered that but, besides the fact that it seems a bit
heavy-handed, it still won't allow me to make sure the C++ objects are
properly finalized.  This creates two problems: for one, some object
might need finalization that's more than just freeing memory and I'd
have to make sure that this isn't the case for each specific type
(which typically come from external libraries).  And for another,
these would show up as leaks in tools such as Valgrind, which would
make checking for memory bugs difficult.

One idea would be to simply call `scm_gc()' and `scm_run_finalizers()'
until the latter returns 0.  As far as I can see, this should ensure
all finalizers are called, assumming no references to any foreign
objects remain, but I see no way of ensuring the latter short of
process termination...

Dimitris




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 11:41       ` Dimitris Papavasiliou
@ 2021-12-22 13:52         ` Thien-Thi Nguyen
  2021-12-22 22:05           ` Dimitris Papavasiliou
  2021-12-22 14:46         ` Olivier Dion via General Guile related discussions
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Thien-Thi Nguyen @ 2021-12-22 13:52 UTC (permalink / raw)
  To: Dimitris Papavasiliou; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 990 bytes --]


[cc trimmed]

() Dimitris Papavasiliou <dpapavas@protonmail.ch>
() Wed, 22 Dec 2021 11:41:22 +0000

   One idea would be to simply call `scm_gc()' and
   `scm_run_finalizers()' until the latter returns 0.  As far as
   I can see, this should ensure all finalizers are called,
   assumming no references to any foreign objects remain, but I
   see no way of ensuring the latter short of process
   termination...

Do guardians help for this?

I typed "scheme guardians finalization" into duckduckgo and the
first hit was:

 https://www.cs.tufts.edu/~nr/cs257/archive/kent-dybvig/guardians.pdf

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)               ; (2021) Software Libero
   (pcase (context query)               ;       = Dissenso Etico
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 219 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 11:41       ` Dimitris Papavasiliou
  2021-12-22 13:52         ` Thien-Thi Nguyen
@ 2021-12-22 14:46         ` Olivier Dion via General Guile related discussions
  2021-12-22 15:23           ` Maxime Devos
  2021-12-22 15:29         ` Maxime Devos
  2021-12-22 17:37         ` Mikael Djurfeldt
  3 siblings, 1 reply; 16+ messages in thread
From: Olivier Dion via General Guile related discussions @ 2021-12-22 14:46 UTC (permalink / raw)
  To: Dimitris Papavasiliou; +Cc: Maxime Devos, guile-user

On Wed, 22 Dec 2021, Dimitris Papavasiliou <dpapavas@protonmail.ch> wrote:

> Foreign objects currently come in two categories:
>
> 1. Complete geometric operations such as `cube' and `difference'
>    above.  These are allocated on the C++ side and a so-called "smart
>    pointer" (shared_ptr) is exported to Scheme.  Failure to finalize
>    this retains a reference on the C++ side, which would prevent
>    destroying the operation.  Since these can get quite large in terms
>    of memory, destroying them after they're no longer needed can be
>    essential.

Since you have a graph of all the primitives in the second phase, you're
basicaly doing garbage collection there.  But from:

> Creating the complete graph before evaluation begins in the second
> phase is probably not necessary (nodes could be evaluated as they're
> created), but it creates the opportunity for certain optimizations
> (like dead code elimination for instance).  This makes some sort of
> forcing/ensuring that Guile has terminated desirable.

If I understood, objects can be garbage before phase 2, thus not
appearing in the final graph of operations.

> One idea would be to simply call `scm_gc()' and `scm_run_finalizers()'
> until the latter returns 0.  As far as I can see, this should ensure
> all finalizers are called, assumming no references to any foreign
> objects remain, but I see no way of ensuring the latter short of
> process termination...

One way I think you could do this is to evaluate all the user operations
in a sandbox environment.

Example:
--------------------------------------------------------------------------------
(use-modules (ice-9 sandbox))
;; ...
(let ([mod (make-sandbox-module (cons '((my-app primitives))
                                       all-pure-bindings))])
   (eval-in-sandbox '(eval-user-file "...") #:module mod))
--------------------------------------------------------------------------------

From `eval-in-sandbox` documentation:

 If SEVER-MODULE? is true (the default), the module will be unlinked
 from the global module tree after the evaluation returns, to allow MOD
 to be garbage-collected.

So I _think_ you're guarantee here that all references in your module
will be garbage collected.  You can then do a single `gc/finalizer`.

-- 
Olivier Dion
Polymtl



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 14:46         ` Olivier Dion via General Guile related discussions
@ 2021-12-22 15:23           ` Maxime Devos
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Devos @ 2021-12-22 15:23 UTC (permalink / raw)
  To: Olivier Dion, Dimitris Papavasiliou; +Cc: guile-user

Olivier Dion schreef op wo 22-12-2021 om 09:46 [-0500]:
> From `eval-in-sandbox` documentation:
> 
>  If SEVER-MODULE? is true (the default), the module will be unlinked
>  from the global module tree after the evaluation returns, to allow MOD
>  to be garbage-collected.
> 
> So I _think_ you're guarantee here that all references in your module
> will be garbage collected.  You can then do a single `gc/finalizer`.

No, there are no guarantees, because the GC (bdw-gc) is conservative.
However, _most_ unreachable objects should be collected (though not
all!).




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 11:41       ` Dimitris Papavasiliou
  2021-12-22 13:52         ` Thien-Thi Nguyen
  2021-12-22 14:46         ` Olivier Dion via General Guile related discussions
@ 2021-12-22 15:29         ` Maxime Devos
  2021-12-22 22:14           ` Dimitris Papavasiliou
  2021-12-22 17:37         ` Mikael Djurfeldt
  3 siblings, 1 reply; 16+ messages in thread
From: Maxime Devos @ 2021-12-22 15:29 UTC (permalink / raw)
  To: Dimitris Papavasiliou, Olivier Dion; +Cc: guile-user

Hi,

Dimitris Papavasiliou schreef op wo 22-12-2021 om 11:41 [+0000]:
> Creating the complete graph before evaluation begins in the second
> phase is probably not necessary (nodes could be evaluated as they're
> created), but it creates the opportunity for certain optimizations
> (like dead code elimination for instance). 
> 

This seems clear to me (I think you mean that there's some guile code
that preprocesses the node graph to optimise it a little, before
evaluating things). I don't see what dead code elimination has to do
with the structure of your program though (it's a guile thing, not a
node graph thing?).

>  This makes some sort of
> forcing/ensuring that Guile has terminated desirable.

... but I don't see how this follows. The only benefit I see from
ensuring Guile terminates, is freeing a little memory. But since the
Guile is basically used as a fancy configuration language, I don't see
the need. (Except for valgrind memory leak detection.)

Greetings,
Maxime.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 11:41       ` Dimitris Papavasiliou
                           ` (2 preceding siblings ...)
  2021-12-22 15:29         ` Maxime Devos
@ 2021-12-22 17:37         ` Mikael Djurfeldt
  2021-12-23  9:40           ` Mikael Djurfeldt
  2021-12-29 16:26           ` Olivier Dion via General Guile related discussions
  3 siblings, 2 replies; 16+ messages in thread
From: Mikael Djurfeldt @ 2021-12-22 17:37 UTC (permalink / raw)
  To: Dimitris Papavasiliou; +Cc: guile-user

Hi,

I think you should give up the idea to GC collect all objects. Due to the
GC being of the conservative type, that can't be guaranteed to happen. Any
attempt to wait until all objects are collected runs the risk of hanging
indefinitely.

Here's a suggestion for a less "heavy-handed" approach than to kill a child
process. Take it with a grain of salt since I'm not entirely up-to-date
with Guile:

First, skip Guile finalizers. You could then let the responsibility of
finalizing and freeing the C++ objects be shared between Guile and your C++
application. For example, the C++ application could have a doubly linked
list of the C++ objects. When Guile collects an object, make it unlink it
from the C++ list. Then, when you want to enter your second phase, you can
go through the list, which now only contains the objects not yet collected,
and finalize them. This creates the following problem: What if some Guile
code runs *after* you have finalized your remnant objects? Then, these
objects would be interpreted as live, but they are, in fact, dead and could
cause a segmentation fault. So, you either need to make sure that no Guile
code runs after this point (maybe somewhat ugly, since it leaves the Guile
data structures in an inconsistent state), or only allow Guile to
deallocate the first level C++ data structure. That data structure could
then, e.g., contain a flag indicating whether the object is dead or alive.

All of this indicates that it could be nice to have some kind of Guile
shutdown call in the C API. Such a shutdown call could go through live
objects and free them.

Best regards,
Mikael

On Wed, Dec 22, 2021 at 12:47 PM Dimitris Papavasiliou <
dpapavas@protonmail.ch> wrote:

> Hi Olivier,
>
> Thanks for chiming in!
>
> On Wednesday, December 22nd, 2021 at 12:39 AM, Olivier Dion wrote:
>
> > From this description, what I understand is that you want to use Scheme
> > as a configuration file for batching the operations to be done in a
> > second phase in C++.
>
> From an architectural point of view, the situation is probably similar
> to simple configuration parsing, as you say.  In practice though, the
> scope of Guile's role won't be nearly as limited as what this might
> imply.
>
> In a way, what I'm working on, can be described as a "compiler" for
> geometry.  That is to say, it reads in the description of geometry in
> some language and then evaluates it and dumps it in a format that can
> be used for e.g. 3D printing.  Guile (amongst other options) supplies
> the language frontend.  For example, one might invoke it on the shell
> like this:
>
> $ compile hollow_cube.scm
>
> Here, `hollow_cube.scm' would contain, say:
>
> (write-stl
>   "hollow_cube.stl"
>   (difference
>     (cube 2 2 2)
>     (cube 1 1 1)))
>
> When the `cube' procedure is evaluated, it creates a node in the C++
> backend that evaluates to the geometry of a cube.  Similarly
> `difference' creates a node, with edges to the two cubes, which when
> evaluated computes their boolean difference and so on.  Of course, in
> this instance, the Scheme code amounts to little more than what might
> be termed a configuration file, but imagine the code to create a
> complex mechanical part and the situation is (conceptually) quite
> different.
>
> Creating the complete graph before evaluation begins in the second
> phase is probably not necessary (nodes could be evaluated as they're
> created), but it creates the opportunity for certain optimizations
> (like dead code elimination for instance).  This makes some sort of
> forcing/ensuring that Guile has terminated desirable.
>
> > However, I failed to see why you need to finalize these objects
> > since you're going to use them in your second phase?
>
> Foreign objects currently come in two categories:
>
> 1. Complete geometric operations such as `cube' and `difference'
>    above.  These are allocated on the C++ side and a so-called "smart
>    pointer" (shared_ptr) is exported to Scheme.  Failure to finalize
>    this retains a reference on the C++ side, which would prevent
>    destroying the operation.  Since these can get quite large in terms
>    of memory, destroying them after they're no longer needed can be
>    essential.
>
> 2. "Complex" arguments to the above operations such as
>    transforamtions, planes, points, etc. which are C++ classes, for
>    which the destructor needs to be called in one way or the other.
>    These are generally passed by value to the operations and can
>    therefore be freed once the first phase is complete.
>
> > One way I think of would be to fork the process and create your C++
> > objects in a shared memory area between the parent and the child. Once
> > Guile is done reading your inputs, the child process dies and all its
> > memory is reclaimed by the OS.
>
> I have considered that but, besides the fact that it seems a bit
> heavy-handed, it still won't allow me to make sure the C++ objects are
> properly finalized.  This creates two problems: for one, some object
> might need finalization that's more than just freeing memory and I'd
> have to make sure that this isn't the case for each specific type
> (which typically come from external libraries).  And for another,
> these would show up as leaks in tools such as Valgrind, which would
> make checking for memory bugs difficult.
>
> One idea would be to simply call `scm_gc()' and `scm_run_finalizers()'
> until the latter returns 0.  As far as I can see, this should ensure
> all finalizers are called, assumming no references to any foreign
> objects remain, but I see no way of ensuring the latter short of
> process termination...
>
> Dimitris
>
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 13:52         ` Thien-Thi Nguyen
@ 2021-12-22 22:05           ` Dimitris Papavasiliou
  0 siblings, 0 replies; 16+ messages in thread
From: Dimitris Papavasiliou @ 2021-12-22 22:05 UTC (permalink / raw)
  To: Thien-Thi Nguyen, olivier.dion, maximedevos, mikael; +Cc: guile-user

Thanks to everybody for their suggestions.  I'll respond to all in this single
message to keep the discussion from spreading out too much.  Please let me know
if this is inconvenient for you.  I also apologize in advance for my large
messages.  There's a TL;DR of sorts in the last 3 paragraphs.

Let me start by noting that there are really two distinct, though connected,
problems:

1. Handling garbage collection

This problem is tractable, the only question being how best to handle it.  As I
neglected to say explicitly, but as Olivier pointed out:

On Wednesday, December 22nd, 2021 at 4:46 PM, Olivier Dion wrote:

> Since you have a graph of all the primitives in the second phase, you're
> basicaly doing garbage collection there.

So, yes, for this class of foreign object, I can essentially simply pass plain
pointers to Guile, and let it go about its business with garbage collection as
it sees fit.  I keep these objects in a graph anyway and already handle freeing
them after phase 2.  Then:

> If I understood, objects can be garbage before phase 2, thus not
> appearing in the final graph of operations.

This class of foreign objects, is indeed used during the first phase, and need
to be finalized once it's finished.  These can be handled as above, by passing
plain pointers to Guile, and keeping tabs on them to finalize them explicitly.
This is less than ideal, because all objects will be kept live until the Scheme
code terminates and although these are typically small, there's no guarantee
that there won't be very many of them.  But this might be alleviated to some
extent by combining our own collection with that of Guile, e.g. as Mikael
suggests:

On Wednesday, December 22nd, 2021 at 7:37 PM, Mikael Djurfeldt wrote:

> For example, the C++ application could have a doubly linked list of the C++
> objects. When Guile collects an object, make it unlink it from the C++
> list. Then, when you want to enter your second phase, you can go through the
> list, which now only contains the objects not yet collected, and finalize
> them.

Although this would require relying on finalizers, it would no longer be
necessary that every single object has its finalizer called; just that most do,
so that no too much memory is wasted.  This seems to be the idea behind
finalizers in the BDW-GC, as far as I could see from its documentation.

This then leaves the more substantial difficulty:

2. Making sure Guile has terminated after phase 1

First of all, this is related to the previous problem.  Although it *is* true
that the GC is conservative, this is not the ultimate reason why it is not
possible to deterministically ensure that all objects are collected and
finalized.  As far as I can see, the ultimate reason is that the GC in use by
Guile works under the assumption that it will be in charge until process exit,
at which point collection becomes unnecessary, as the OS will take over. If it
were possible to tell Guile to shut down and clean up, the GC would know that
all tracked objects are now up for collection as there's no-one left to use
them.  This is possible with Lua for instance, another language meant to be
embedded where the Lua state can be closed, at which point all objects are
collected.  This allowed me to embed Lua without much trouble.

This also precludes this suggestion:

On Wednesday, December 22nd, 2021 at 3:52 PM, Thien-Thi Nguyen wrote:
> Do guardians help for this?

Alas no, because a) as far as I can tell guarded objects still refer to the GC
to tell whether they are collectable and its conservatism will still create
problems, but more importantly b) I can see no documented way to sever all
references the Scheme code might have made to the foreign objects (but see more
below).

On this issue Olivier suggests:

On Wednesday, December 22nd, 2021 at 4:46 PM, Olivier Dion wrote:

> One way I think you could do this is to evaluate all the user operations
> in a sandbox environment.

> If SEVER-MODULE? is true (the default), the module will be unlinked
> from the global module tree after the evaluation returns, to allow MOD
> to be garbage-collected.

This is interesting, but sandboxed environments turn out to be too restricting
and not meant for this purpose.  As far as I could tell, I cannot even load code
from within:

scheme@(guile-user)> (use-modules (ice-9 sandbox))
scheme@(guile-user)> (eval-in-sandbox '(load "test.scm"))
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
Unbound variable: load

Inspired by the `sever-module?' argument I tried severing the default module, as
returned by `scm_current_module' and `scm_interaction_environment' like this:

    scm_call_1(
        scm_variable_ref(
            scm_c_private_variable("ice-9 sandbox", "sever-module!")),
            env);

This seemed to work (the severing part), but didn't help in allowing collection
of e.g. foreign objects bound to global variables, presumably because other
references are kept on the default interaction environment.

I also tried creating and then severing a custom-built r5rs environment (made
with `scheme-report-environment'), but this couldn't even be severed.

More out of spite than anything, I tried to clear the default module.  Noting
that a module is really a structure (although I have only a very hazy idea what
this really means), I tried:

    SCM env = scm_current_module();
    scm_struct_set_x(env, scm_from_int(0), SCM_EOL);
    scm_struct_set_x(env, scm_from_int(1), SCM_BOOL_F);

Lo and behold, this succeeded in allowing all objects to be collected!  But one
might say: what of it?  This is still a hack, depending on implementation
details.

But there's a bigger (in my view at least) issue here.  As Mikael notes:

On Wednesday, December 22nd, 2021 at 7:37 PM, Mikael Djurfeldt wrote:

> This creates the following problem: What if some Guile code runs *after* you
> have finalized your remnant objects? [...] All of this indicates that it could
> be nice to have some kind of Guile shutdown call in the C API. Such a shutdown
> call could go through live objects and free them.

On the same matter, Maxime said:

On Wednesday, December 22nd, 2021 at 5:29 PM, Maxime Devos wrote:
> > This makes some sort of
> > forcing/ensuring that Guile has terminated desirable.
>
> ... but I don't see how this follows. The only benefit I see from
> ensuring Guile terminates, is freeing a little memory. But since the
> Guile is basically used as a fancy configuration language, I don't see
> the need. (Except for valgrind memory leak detection.)

(Again this memory doesn't have to be a little.  As a simple illustration, if
the program makes a geometry made out of a 3D grid of 10 * 10 * 10 cubes say, it
will have to allocate 1000 transformation objects to translate the cubes into
place, which will be retained needlessly.  Worse situations are easily
imaginable, depending on how optimistic one is.)

But the real issue is the one brought up by Mikael.  Guile is quite large, with
many features and quite complex control flow mechanisms.  As long as Guile is up
and running, one can't be really sure that it won't somehow interfere with the
execution of the embedding program when it shouldn't (in phase 2 in my case) and
in ways that are not predictable.

This is may be no more than a psychological problem, a mere pseudo-concern, but
I'm not certain of that.  Some user might have the main code start threads for
instance, which persist past the point of its return and while that's easily
fixed by joining all threads before phase 2 say, other such issues may not be as
tractable.  Embedding Guile requires effort and having the possibility of
discovering hard problems late in the game, is not entirely insignificant in
this respect.

So TL;DR:  I think the issue boils down to whether it is possible to shut down
Guile and have it clean up before process exit.  If this is not currently
possible, another interesting question might how well such a feature would fit
into Guile's current design and whether it would be desirable to implement it.
I would argue that, although perhaps not indispensable, it would certainly not
be unnecessary for a language specifically designed to be embedded.

Dimitris

PS: Let me know if you think I should start a new thread for this.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 15:29         ` Maxime Devos
@ 2021-12-22 22:14           ` Dimitris Papavasiliou
  0 siblings, 0 replies; 16+ messages in thread
From: Dimitris Papavasiliou @ 2021-12-22 22:14 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user

On Wednesday, December 22nd, 2021 at 5:29 PM, Maxime Devos wrote:

> This seems clear to me (I think you mean that there's some guile code
> that preprocesses the node graph to optimise it a little, before
> evaluating things). I don't see what dead code elimination has to do
> with the structure of your program though (it's a guile thing, not a
> node graph thing?).

Yes, you understand essentially correctly.  Dead code elimination was just the
first example I could think of:  If a certain cluster of nodes within the graph
has no edges to nodes that output the results (say because the user has comented
out the Scheme code that writes one of the output files) then the operations
concerned need not be evaluated.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 17:37         ` Mikael Djurfeldt
@ 2021-12-23  9:40           ` Mikael Djurfeldt
  2021-12-29 16:26           ` Olivier Dion via General Guile related discussions
  1 sibling, 0 replies; 16+ messages in thread
From: Mikael Djurfeldt @ 2021-12-23  9:40 UTC (permalink / raw)
  To: Dimitris Papavasiliou; +Cc: guile-user

Den ons 22 dec. 2021 18:37Mikael Djurfeldt <mikael@djurfeldt.com> skrev:

> Hi,
>
> I think you should give up the idea to GC collect all objects. Due to the
> GC being of the conservative type, that can't be guaranteed to happen. Any
> attempt to wait until all objects are collected runs the risk of hanging
> indefinitely.
>
> Here's a suggestion for a less "heavy-handed" approach than to kill a
> child process. Take it with a grain of salt since I'm not entirely
> up-to-date with Guile:
>
> First, skip Guile finalizers.
>

Actually, here not being up-to-date bit me. I'm accustomed to extending
Guile using SMOB:s. But, as you know, now we are supposed to use the
foreign object interface where "finalizer" corresponds to the SMOB free
function.

So, yes, what I write below then relies on finalizers. Specifically, one of
the tasks of the finalizer is to unlink the object.

Sorry for the confusion.

You could then let the responsibility of finalizing and freeing the C++
> objects be shared between Guile and your C++ application. For example, the
> C++ application could have a doubly linked list of the C++ objects. When
> Guile collects an object, make it unlink it from the C++ list. Then, when
> you want to enter your second phase, you can go through the list, which now
> only contains the objects not yet collected, and finalize them. This
> creates the following problem: What if some Guile code runs *after* you
> have finalized your remnant objects? Then, these objects would be
> interpreted as live, but they are, in fact, dead and could cause a
> segmentation fault. So, you either need to make sure that no Guile code
> runs after this point (maybe somewhat ugly, since it leaves the Guile data
> structures in an inconsistent state), or only allow Guile to deallocate the
> first level C++ data structure. That data structure could then, e.g.,
> contain a flag indicating whether the object is dead or alive.
>
> All of this indicates that it could be nice to have some kind of Guile
> shutdown call in the C API. Such a shutdown call could go through live
> objects and free them.
>
> Best regards,
> Mikael
>
> On Wed, Dec 22, 2021 at 12:47 PM Dimitris Papavasiliou <
> dpapavas@protonmail.ch> wrote:
>
>> Hi Olivier,
>>
>> Thanks for chiming in!
>>
>> On Wednesday, December 22nd, 2021 at 12:39 AM, Olivier Dion wrote:
>>
>> > From this description, what I understand is that you want to use Scheme
>> > as a configuration file for batching the operations to be done in a
>> > second phase in C++.
>>
>> From an architectural point of view, the situation is probably similar
>> to simple configuration parsing, as you say.  In practice though, the
>> scope of Guile's role won't be nearly as limited as what this might
>> imply.
>>
>> In a way, what I'm working on, can be described as a "compiler" for
>> geometry.  That is to say, it reads in the description of geometry in
>> some language and then evaluates it and dumps it in a format that can
>> be used for e.g. 3D printing.  Guile (amongst other options) supplies
>> the language frontend.  For example, one might invoke it on the shell
>> like this:
>>
>> $ compile hollow_cube.scm
>>
>> Here, `hollow_cube.scm' would contain, say:
>>
>> (write-stl
>>   "hollow_cube.stl"
>>   (difference
>>     (cube 2 2 2)
>>     (cube 1 1 1)))
>>
>> When the `cube' procedure is evaluated, it creates a node in the C++
>> backend that evaluates to the geometry of a cube.  Similarly
>> `difference' creates a node, with edges to the two cubes, which when
>> evaluated computes their boolean difference and so on.  Of course, in
>> this instance, the Scheme code amounts to little more than what might
>> be termed a configuration file, but imagine the code to create a
>> complex mechanical part and the situation is (conceptually) quite
>> different.
>>
>> Creating the complete graph before evaluation begins in the second
>> phase is probably not necessary (nodes could be evaluated as they're
>> created), but it creates the opportunity for certain optimizations
>> (like dead code elimination for instance).  This makes some sort of
>> forcing/ensuring that Guile has terminated desirable.
>>
>> > However, I failed to see why you need to finalize these objects
>> > since you're going to use them in your second phase?
>>
>> Foreign objects currently come in two categories:
>>
>> 1. Complete geometric operations such as `cube' and `difference'
>>    above.  These are allocated on the C++ side and a so-called "smart
>>    pointer" (shared_ptr) is exported to Scheme.  Failure to finalize
>>    this retains a reference on the C++ side, which would prevent
>>    destroying the operation.  Since these can get quite large in terms
>>    of memory, destroying them after they're no longer needed can be
>>    essential.
>>
>> 2. "Complex" arguments to the above operations such as
>>    transforamtions, planes, points, etc. which are C++ classes, for
>>    which the destructor needs to be called in one way or the other.
>>    These are generally passed by value to the operations and can
>>    therefore be freed once the first phase is complete.
>>
>> > One way I think of would be to fork the process and create your C++
>> > objects in a shared memory area between the parent and the child. Once
>> > Guile is done reading your inputs, the child process dies and all its
>> > memory is reclaimed by the OS.
>>
>> I have considered that but, besides the fact that it seems a bit
>> heavy-handed, it still won't allow me to make sure the C++ objects are
>> properly finalized.  This creates two problems: for one, some object
>> might need finalization that's more than just freeing memory and I'd
>> have to make sure that this isn't the case for each specific type
>> (which typically come from external libraries).  And for another,
>> these would show up as leaks in tools such as Valgrind, which would
>> make checking for memory bugs difficult.
>>
>> One idea would be to simply call `scm_gc()' and `scm_run_finalizers()'
>> until the latter returns 0.  As far as I can see, this should ensure
>> all finalizers are called, assumming no references to any foreign
>> objects remain, but I see no way of ensuring the latter short of
>> process termination...
>>
>> Dimitris
>>
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Need help embedding Guile
  2021-12-22 17:37         ` Mikael Djurfeldt
  2021-12-23  9:40           ` Mikael Djurfeldt
@ 2021-12-29 16:26           ` Olivier Dion via General Guile related discussions
  1 sibling, 0 replies; 16+ messages in thread
From: Olivier Dion via General Guile related discussions @ 2021-12-29 16:26 UTC (permalink / raw)
  To: Mikael Djurfeldt, Dimitris Papavasiliou
  Cc: Maxime Devos, guile-user, Mikael Djurfeldt

On Wed, 22 Dec 2021, Mikael Djurfeldt <mikael@djurfeldt.com> wrote:

> All of this indicates that it could be nice to have some kind of Guile
> shutdown call in the C API. Such a shutdown call could go through live
> objects and free them.

Indeed, that's a cool feature.  Not many libraries are able to do so
though and I think for example that libgc might be a problem there.

-- 
Olivier Dion
Polymtl



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-29 16:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 11:12 Need help embedding Guile Dimitris Papavasiliou
2021-12-21 11:37 ` Maxime Devos
2021-12-21 13:45   ` Maxime Devos
2021-12-21 13:48   ` Maxime Devos
2021-12-21 21:50   ` Dimitris Papavasiliou
2021-12-21 22:39     ` Olivier Dion via General Guile related discussions
2021-12-22 11:41       ` Dimitris Papavasiliou
2021-12-22 13:52         ` Thien-Thi Nguyen
2021-12-22 22:05           ` Dimitris Papavasiliou
2021-12-22 14:46         ` Olivier Dion via General Guile related discussions
2021-12-22 15:23           ` Maxime Devos
2021-12-22 15:29         ` Maxime Devos
2021-12-22 22:14           ` Dimitris Papavasiliou
2021-12-22 17:37         ` Mikael Djurfeldt
2021-12-23  9:40           ` Mikael Djurfeldt
2021-12-29 16:26           ` Olivier Dion via General Guile related discussions

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).