Re: ffi docs - Andy Wingo

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Andy Wingo <wingo@pobox.com>
To: Neil Jerram <neil@ossau.uklinux.net>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: ffi docs
Date: Fri, 16 Apr 2010 11:33:07 +0200	[thread overview]
Message-ID: <m3eiif214c.fsf@pobox.com> (raw)
In-Reply-To: <87ljco1gyc.fsf@ossau.uklinux.net> (Neil Jerram's message of "Thu, 15 Apr 2010 23:36:27 +0100")

Hi,

Thanks for the feedback!

On Fri 16 Apr 2010 00:36, Neil Jerram <neil@ossau.uklinux.net> writes:

>>    But yet we as programmers live in both worlds, and Guile itself is
>> half implemented in C. So it is that Guile's living half pays respect
>> to its dead counterpart, via a spectrum of interfaces to C ranging from
>> dynamic loading of Scheme primitives to dynamic binding of stock C
>> library prodedures.
>
> c -----------^

What does this mean?

>>    We titled this section "foreign libraries" because although the name
>> "foreign" doesn't leak into the API, the world of C really is foreign
>> to Scheme - and that estrangement extends to components of foreign
>> libraries as well, as we see in future sections.
>
> I'm not sure what the message is here.

Probably me being to cutesy, I would imagine. The facility is typically
called a "foreign function interface", but that name doesn't appear in
e.g. "dynamic-link", so I was trying to explain.

Beyond that I guess I meant to say that "native" depends on where you're
coming from; that Scheme calls are native to Scheme, and C calls are
foreign to Scheme.

>>  -- Scheme Procedure: dynamic-link [library]
>>  -- C Function: scm_dynamic_link (library)
>
> Code below implies that library can be omitted, and that this -
> i.e. '(dynamic-link)' - means to return an object representing libguile
> itself.  Should that be mentioned in the following doc?
>
>>      Find the shared library denoted by LIBRARY (a string) and link it
>>      into the running Guile application.  When everything works out,
>>      return a Scheme object suitable for representing the linked object
>>      file.  Otherwise an error is thrown.  How object files are
>>      searched is system dependent.
>>
>>      Normally, LIBRARY is just the name of some shared library file
>>      that will be searched for in the places where shared libraries
>>      usually reside, such as in `/usr/lib' and `/usr/local/lib'.
>>
>>      When LIBRARY is omitted, a "global symbol handle" is returned.
>>      This handle provides access to the symbols available to the
>>      program at run-time, including those exported by the program
>>      itself and the shared libraries already loaded.

I think it is mentioned, no? Is there a way that it can be more clear?

>>    Given some set of C extensions to Guile, the next logical step is to
>> integrate these glue libraries into the module system of Guile so that
>> you can load new primitives into a running system just as you can load
>> new Scheme code.
>>
>>  -- Scheme Procedure: load-extension lib init
>>  -- C Function: scm_load_extension (lib, init)
>>      Load and initialize the extension designated by LIB and INIT.
>>      When there is no pre-registered function for LIB/INIT, this is
>>      equivalent to
>>
>>           (dynamic-call INIT (dynamic-link LIB))
>>
>>      When there is a pre-registered function, that function is called
>>      instead.
>>
>>      Normally, there is no pre-registered function.  This option exists
>>      only for situations where dynamic linking is unavailable or
>>      unwanted.  In that case, you would statically link your program
>>      with the desired library, and register its init function right
>>      after Guile has been initialized.
>
> Should there be a reference from here to wherever the registration API
> is covered?

Probably. Is it documented somewhere? :) I think no. I would doc it
here, fwiw...

>>      LIB should be a string denoting a shared library without any file
>>      type suffix such as ".so".  The suffix is provided automatically.
>>      It should also not contain any directory components.  Libraries
>>      that implement Guile Extensions should be put into the normal
>>      locations for shared libraries.  We recommend to use the naming
>>      convention libguile-bla-blum for a extension related to a module
>>      `(bla blum)'.
>
> I believe this will shortly be out of date, won't it? - given our desire
> to support parallel installs.

Hm, good point; though if it is installed into the extensionsdir as
suggested below, we do work around this issue.

>>    A compiled module should have a specially named "module init
>> function".  Guile knows about this special name and will call that
>> function automatically after having linked in the shared library.  For
>> our example, we replace `init_math_bessel' with the following code in
>> `bessel.c':
>>
>>      void
>>      init_math_bessel (void *unused)
>>      {
>>        scm_c_define_gsubr ("j0", 1, 0, 0, j0_wrapper);
>>        scm_c_export ("j0", NULL);
>>      }
>>
>>      void
>>      scm_init_math_bessel_module ()
>>      {
>>        scm_c_define_module ("math bessel", init_math_bessel, NULL);
>>      }
>>
>>    The general pattern for the name of a module init function is:
>> `scm_init_', followed by the name of the module where the individual
>> hierarchical components are concatenated with underscores, followed by
>> `_module'.
>
> Is this still correct?  IIUC it only makes sense as part of the ability
> we once had for a (use-modules (...)) call to find a .so and bootstrap
> it automatically.  (Unless that has been reinstated...)

It has not been reinstated. However SWIG seems to use this facility --
doing a (load-extension ...) call to load up a module, then you use the
module.

>>    Presently there's no convention for having a Guile version number in
>> module C code filenames or directories.  This is primarily because
>> there's no established principles for two versions of Guile to be
>> installed under the same prefix (eg. two both under `/usr').  Assuming
>> upward compatibility is maintained then this should be unnecessary, and
>> if compatibility is not maintained then it's highly likely a package
>> will need to be revisited anyway.
>>
>>    The present suggestion is that modules should assume when they're
>> installed under a particular `prefix' that there's a single version of
>> Guile there, and the `guile-config' at build time has the necessary
>> information about it.  C code or Scheme code might adapt itself
>> accordingly (allowing for features not available in an older version
>> for instance).
>
> I guess this also needs updating, for the new parallel install vision.

Probably; there is $extensionsdir, but I am not finding it in this
chapter; durnit. Here's the NEWS entry:

** Dynamically loadable extensions may be placed in a Guile-specific path

Before, Guile only searched the system library paths for extensions
(e.g. /usr/lib), which meant that the names of Guile extensions had to
be globally unique. Installing them to a Guile-specific extensions
directory is cleaner. Use `pkg-config --variable=extensionsdir
guile-2.0' to get the location of the extensions directory.


>> 0.1.5 Foreign Pointers
>> ----------------------
>>
>> The previous sections have shown how Guile can be extended at runtime by
>> loading compiled C extensions. This approach is all well and good, but
>> wouldn't it be nice if we didn't have to write any C at all? This
>> section takes up the problem of accessing C values from Scheme, and the
>> next discusses C functions.
>>
>> 0.1.5.1 Foreign Types
>> .....................
>>
>> The first impedance mismatch that one sees between C and Scheme is that
>> in C, the storage locations (variables) are typed, but in Scheme types
>> are associated with values, not variables. *Note Values and Variables::.
>
> Fine, but...
>
>>    So when accessing a C value through a Scheme pointer, we must give
>> the type of the pointed-to value explicitly, as a parameter to any
>> Scheme procedure that accesses the value.
>
> This confused me at first.  I think I understand the point now, but
>
> - isn't it actually much more to do with the ELF binary format, rather
>   than with C?  If libguile could read and parse C, it would be able to
>   infer the type of any variable that the Scheme layer might request.
>   The problem is precisely that what we are linking with is *not* C
>   anymore...  It's just untyped pointers.

I guess you're right, this is confusing. C doesn't really exist at
runtime, and this API is all about accessing runtime values.

> - I think "give the type ... as a parameter to any Scheme procedure that
>   accesses the value" is misleading, because we don't do that!  Rather,
>   we construct a box that includes both the pointer and the type, and
>   then pass the box around.

True, though there are void pointers, which can be treated as raw memory
arrays, and parsed with the bytevector functions. But agreed, "as a
parameter" is incorrect.

>> 0.1.5.2 Foreign Variables
>> .........................
>>
>> Given the types defined in the previous section, C pointers may be
>> looked up dynamically using `dynamic-pointer'.
>>
>>  -- Scheme Procedure: dynamic-pointer name type dobj [len]
>>  -- C Function: scm_dynamic_pointer (name, type, dobj, len)
>>      Return a "handle" for the pointer NAME in the shared object
>>      referred to by DOBJ. The handle aliases a C value, and is declared
>>      to be of type TYPE. Valid types are defined in the `(system
>>      foreign)' module.
>>
>>      This facility works by asking the dynamic linker for the address
>>      of a symbol, then assuming that it aliases a value of a given
>>      type. Obviously, the user must be very careful to ensure that the
>>      value actually is of the declared type, or bad things will happen.
>>
>>      Regardless whether your C compiler prepends an underscore `_' to
>>      the global names in a program, you should *not* include this
>>      underscore in NAME since it will be added automatically when
>>      necessary.
>>
>>    For example, currently Guile has a variable, `scm_numptob', as part
>> of its API. It is declared as a C `long'. So, to create a handle
>> pointing to that foreign value, we do:
>>
>>      (use-modules (system foreign))
>>      (define numptob (dynamic-pointer "scm_numptob" long (dynamic-link)))
>>      numptob
>>      => #<foreign int32 8>
>>
>>    A value returned by `dynamic-pointer' is a Scheme wrapper for a C
>> pointer, with additional type information. A foreign pointer prints
>> according to its type. This example showed that a `long' on this
>> platform is an `int32', and that the value pointed to by `numptob' is 8.
>
> I think the terminology is confusing here in two ways.
>
> 1. The API and the doc call these objects pointers, but because of the
> automatic dereference they don't behave like pointers at all.  (Their
> print function prints *p, not p, and foreign-set! does *p = val, not p =
> val.)

That is the case for non-void pointers, yes; but dynamic-pointer does
not give you a value. Perhaps as you mention a "reference" would be less
ambiguous; or perhaps more?

Perhaps we should make these things print like #<foreign-pointer
*0xdeadbeef = (int32)8> or something? (Or, as mentioned below, just as
#<foreign-pointer 0xdeadbeef> ?)

> 2. An object created by '(dynamic-pointer ...)' prints as '#<foreign
> ...>'.  If you think that foreign is the best word for this whole
> area (and I think it's fine), I think you should bite the bullet and
> make all the APIs say 'foreign' instead of 'dynamic'.  (And obviously
> keep the 'dynamic' names of 1.8.x APIs as aliases.)

Hmmmmmmmmmmmmmmmmmmmmm. But you can make foreign pointers in other ways
than from dlsym() -- for example, the return value of a function. I
agree though that "dynamic-pointer" is confusing, though, probably
because "pointer" is a noun and not a verb like "link".

Can you think of a better name for "dynamic-pointer"?

>> 0.1.5.3 Void Pointers and Byte Access
>> .....................................
>>
>> As a special case, a dynamic pointer may be declared to point to type
>> `void', in which case it is treated as a void pointer. A void pointer
>> prints its value as a pointer, without dereferencing the pointer.
>>
>>    It's important at this point to conceptually separate foreign values
>> from foreign pointers. `dynamic-pointer' gives you a foreign pointer. A
>> foreign value is the semantic meaning of the bytes pointed to by a
>> pointer. Only foreign pointers may be wrapped in Scheme. One may make a
>> pointer to a foreign value, and wrap that as a Scheme object, but a
>> bare foreign value may not be wrapped.
>
> I'm not getting the distinction here at all.  Is it important for what
> follows?

Maybe not. Perhaps it's just a vestigial remnant of my personal process
of understanding these things. But you haven't gotten to functions yet,
in which foreign values need to be passed as values and not as pointers.

>>    As an example, `(dynamic-pointer "foo" void bar-lib)' links in the
>> FOO symbol in the BAR-LIB library as a pointer to `void': a `void*'.
>>
>>    Void pointers may be accessed as bytevectors.
>>
>>  -- Scheme Procedure: foreign->bytevector foreign [uvec_type [offset
>>           [len]]]
>>  -- C Function: scm_foreign_to_bytevector foreign uvec_type offset len
>>      Return a bytevector aliasing the memory pointed to by FOREIGN.
>>
>>      FOREIGN must be a void pointer, a foreign whose type is VOID. By
>>      default, the resulting bytevector will alias all of the memory
>>      pointed to by FOREIGN, from beginning to end, treated as a `vu8'
>>      array.
>
> It feels like we're missing a unification trick here.
>
> Thought #1: if we have, e.g., an int8 pointer ip, why not just use
> (foreign-ref ip n) to interpret the pointer as pointing to an array, and
> get its nth element?
>
> Thought #2: but if we do that we'll be duplicating the bytevector API.

Right.

> So instead, shouldn't the fundamental operation be (foreign->bytevector
> NAME TYPE LIBRARY [LEN]), and get/set then done using the bytevector
> API?

Perhaps. foreign-ref and foreign-set! aren't actually used anywhere in
Guile, so perhaps they should go. They just seemed convenient. But maybe
convenience shouldn't be a concern of a low-level FFI. I am inclined to
agree with you.

> I'm not sure either of those thoughts is right, but the current API
> doesn't feel as elegant as I think it could be.

I agree that it has that kindof "off" feel; but that if you read on to
structs and functions, those sections will clarify your objections.

How are we to handle these changes? I feel like the manual would end up
better if you did it, because my mind is clouded with the
implementation; yours is fresh, and would do a better job explaining.
Also, those docs were quite a slog to write in the first place ;) What
do you think?

Awaiting your next dispatch!

Andy
-- 
http://wingolog.org/

next prev parent reply	other threads:[~2010-04-16  9:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-06 21:36 ffi docs Andy Wingo
2010-04-07 21:38 ` Ludovic Courtès
2010-04-07 22:01   ` Andy Wingo
2010-04-15 22:36 ` Neil Jerram
2010-04-16  8:43   ` Ludovic Courtès
2010-04-16  9:33   ` Andy Wingo [this message]
2010-04-16 22:34     ` Neil Jerram
2010-04-17 10:38       ` Andy Wingo
2010-07-27  8:24   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3eiif214c.fsf@pobox.com \
    --to=wingo@pobox.com \
    --cc=guile-devel@gnu.org \
    --cc=neil@ossau.uklinux.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).