From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Re: RFC: Foreign objects facility Date: Mon, 28 Apr 2014 18:08:44 +0200 Message-ID: <877g69308j.fsf@pobox.com> References: <87bnvm52u6.fsf@pobox.com> <87a9b692ys.fsf@yeeloong.lan> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1398701378 21552 80.91.229.3 (28 Apr 2014 16:09:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Apr 2014 16:09:38 +0000 (UTC) Cc: guile-devel To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Apr 28 18:09:32 2014 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Weo7T-0007o6-Au for guile-devel@m.gmane.org; Mon, 28 Apr 2014 18:09:27 +0200 Original-Received: from localhost ([::1]:44855 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Weo7S-0003qs-Hn for guile-devel@m.gmane.org; Mon, 28 Apr 2014 12:09:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47366) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Weo7J-0003nA-MH for guile-devel@gnu.org; Mon, 28 Apr 2014 12:09:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Weo7C-0007CL-7r for guile-devel@gnu.org; Mon, 28 Apr 2014 12:09:17 -0400 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:52310 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Weo7C-0007B5-0Q for guile-devel@gnu.org; Mon, 28 Apr 2014 12:09:10 -0400 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id B358111E60; Mon, 28 Apr 2014 12:08:50 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=2fct52Yn+pUwhXihWoo9L+Z0gVo=; b=DxYVat x8vNylRM63PlZac/LZwR40Xm3+w//dEyxH4yAi65kk7YMbPnuJxeJoQIrZM6d+dQ RojKnVOD+9lCoEPTKxElVkYKXARqI++EraFLBsKHqsTy3fy39Xmx2SdYQZa32Evk eFZR4mNkYlNismGY8lyUPzjX+iGfJqoIgXu9o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=QWxZZWzFhdbCvydOiTbta2OwI6NyhkSi jFB084iNtk57BQZmXZCFW7Br6fUGoP11rcdXsh9BkUISpBTfuwj+l1XVUF1vhhw8 WhOEJB3QzkFBCWQ3oqvhJsPG2t/bv797Z2L4mKvDNztzsomVuXdqp3wIXoUQxks5 vwfpdVaLkbQ= Original-Received: from a-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id A9EE811E5F; Mon, 28 Apr 2014 12:08:50 -0400 (EDT) Original-Received: from badger (unknown [88.160.190.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id D7C5D11E5B; Mon, 28 Apr 2014 12:08:47 -0400 (EDT) In-Reply-To: <87a9b692ys.fsf@yeeloong.lan> (Mark H. Weaver's message of "Sun, 27 Apr 2014 12:00:59 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-Pobox-Relay-ID: 6196EFCE-CEEF-11E3-8E0C-6F330E5B5709-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 208.72.237.25 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:17110 Archived-At: --=-=-= Content-Type: text/plain On Sun 27 Apr 2014 18:00, Mark H Weaver writes: > Andy Wingo writes: > >> I propose to provide a new interface that will eventually make SMOBs >> obsolete. This new interface is based on structs with raw fields -- the >> 'u' fields. (See >> http://www.gnu.org/software/guile/docs/master/guile.html/Vtables.html#Vtables >> for description of 'u' fields. Note that the documentation is wrong -- >> these fields are indeed traced by the GC.) > > Sounds like a good idea, in general. Thanks; with this note, Ludo's approbation, and a doc update I feel OK with pushing to stable-2.0. Let's hammer out the details there. >> SCM scm_make_foreign_object_1 (SCM type, scm_t_bits val0); >> SCM scm_make_foreign_object_2 (SCM type, scm_t_bits val0, >> scm_t_bits val1); >> SCM scm_make_foreign_object_3 (SCM type, scm_t_bits val0, >> scm_t_bits val1, scm_t_bits val2); > > If we include an interface like this, I think we should have more of > these, but see below. I changed this interface to take void* pointers, with the idea that that's the common thing. Same with scm_foreign_object_ref. But I added scm_foreign_object_signed_ref, unsigned_set_x, etc, with notes in the manual that you can indeed provide fewer initializers than fields, and then initialize the fields yourself later. > #ifndef __GNUC__ > if (bits > SCM_T_SIGNED_BITS_MAX) > return -1 - (scm_t_signed_bits) ~bits; > #endif > return (scm_t_signed_bits) bits; > } I think I forgot to actually implement this bit though :P Tutorial-style documentation (in the Programming in C chapter, replacing the SMOB tutorial) attached. There is also reference-style documentation in the API chapter. Andy --=-=-= Content-Type: text/plain Content-Disposition: inline; filename=libguile-foreign-objects.txt 1 Defining New Foreign Object Types =================================== The "foreign object type" facility is Guile's mechanism for importing object and types from C or other languages into Guile's system. If you have a C 'struct foo' type, for example, you can define a corresponding Guile foreign object type that allows Scheme code to handle 'struct foo *' objects. To define a new foreign object type, the programmer provides Guile with some essential information about the type -- what its name is, how many fields it has, and its finalizer (if any) -- and Guile allocates a fresh type for it. Foreign objects can be accessed from Scheme or from C. 1.1 Defining Foreign Object Types --------------------------------- To create a new foreign object type from C, call 'scm_make_foreign_object_type'. It returns a value of type 'SCM' which identifies the new type. Here is how one might declare a new type representing eight-bit gray-scale images: #include struct image { int width, height; char *pixels; /* The name of this image */ SCM name; /* A function to call when this image is modified, e.g., to update the screen, or SCM_BOOL_F if no action necessary */ SCM update_func; }; static SCM image_type image_type; void init_image_type (void) { SCM name, slots; scm_t_struct_finalize finalizer; name = scm_from_utf8_symbol ("image"); slots = scm_list_1 (scm_from_utf8_symbol ("data")); finalizer = NULL; image_type = scm_make_foreign_object_type (name, slots, finalizer); } The result is an initialized 'image_type' value that identifies the new foreign object type. The next section describes how to create foreign objects and how to access their slots. 1.2 Creating Foreign Objects ---------------------------- Foreign objects contain zero or more "slots" of data. A slot can hold a pointer, an integer that fits into a 'size_t' or 'ssize_t', or a 'SCM' value. All objects of a given foreign type have the same number of slots. In the example from the previous section, the 'image' type has one slot, because the slots list passed to 'scm_make_foreign_object_type' is of length one. (The actual names given to slots are unimportant for most users of the C interface, but can be used on the Scheme side to introspect on the foreign object.) To construct a foreign object and initialize its first slot, call 'scm_make_foreign_object_1 (TYPE, FIRST_SLOT_VALUE)'. There are similarly named constructors for initializing 0, 1, 2, or 3 slots, or initializing N slots via an array. *Note Foreign Objects::, for full details. Any fields that are not explicitly initialized are set to 0. To get or set the value of a slot by index, you can use the 'scm_foreign_object_ref' and 'scm_foreign_object_set_x' functions. These functions take and return values as 'void *' pointers; there are corresponding convenience procedures like '_signed_ref', '_unsigned_set_x' and so on for dealing with slots as signed or unsigned integers. Foreign objects fields that are pointers can be tricky to manage. If possible, it is best that all memory that is referenced by a foreign object be managed by the garbage collector. That way, the GC can automatically ensure that memory is accessible when it is needed, and freed when it becomes inaccessible. If this is not the case for your program - for example, if you are exposing an object to Scheme that was allocated by some other, Guile-unaware part of your program - then you will probably need to implement a finalizer. *Note Foreign Object Memory Management::, for more. Continuing the example from the previous section, if the global variable 'image_type' contains the type returned by 'scm_make_foreign_object_type', here is how we could construct a foreign object whose "data" field contains a pointer to a freshly allocated 'struct image': SCM make_image (SCM name, SCM s_width, SCM s_height) { struct image *image; int width = scm_to_int (s_width); int height = scm_to_int (s_height); /* Allocate the `struct image'. Because we use scm_gc_malloc, this memory block will be automatically reclaimed when it becomes inaccessible, and its members will be traced by the garbage collector. */ image = (struct image *) scm_gc_malloc (sizeof (struct image), "image"); image->width = width; image->height = height; /* Allocating the pixels with scm_gc_malloc_pointerless means that the pixels data is collectable by GC, but that GC shouldn't spend time tracing its contents for nested pointers because there aren't any. */ image->pixels = scm_gc_malloc_pointerless (width * height, "image pixels"); image->name = name; image->update_func = SCM_BOOL_F; /* Now wrap the struct image* in a new foreign object, and return that object. */ return scm_make_foreign_object_1 (image_type, image); } We use 'scm_gc_malloc_pointerless' for the pixel buffer to tell the garbage collector not to scan it for pointers. Calls to 'scm_gc_malloc', 'scm_make_foreign_object_1', and 'scm_gc_malloc_pointerless' raise an exception in out-of-memory conditions; the garbage collector is able to reclaim previously allocated memory if that happens. 1.3 Type Checking of Foreign Objects ------------------------------------ Functions that operate on foreign objects should check that the passed 'SCM' value indeed is of the correct type before accessing its data. They can do this with 'scm_assert_foreign_object_type'. For example, here is a simple function that operates on an image object, and checks the type of its argument. SCM clear_image (SCM image_obj) { int area; struct image *image; scm_assert_foreign_object_type (image_type, image_obj); image = scm_foreign_object_ref (image_obj, 0); area = image->width * image->height; memset (image->pixels, 0, area); /* Invoke the image's update function. */ if (scm_is_true (image->update_func)) scm_call_0 (image->update_func); return SCM_UNSPECIFIED; } 1.4 Foreign Object Memory Management ------------------------------------ Once a foreign object has been released to the tender mercies of the Scheme system, it must be prepared to survive garbage collection. In the example above, all the memory associated with the foreign object is managed by the garbage collector because we used the 'scm_gc_' allocation functions. Thus, no special care must be taken: the garbage collector automatically scans them and reclaims any unused memory. However, when data associated with a foreign object is managed in some other way--e.g., 'malloc''d memory or file descriptors--it is possible to specify a "finalizer" function to release those resources when the foreign object is reclaimed. As discussed in *note Garbage Collection::, Guile's garbage collector will reclaim inaccessible memory as needed. This reclamation process runs concurrently with the main program. When Guile analyzes the heap and determines that an object's memory can be reclaimed, that memory is put on a "free list" of objects that can be reclaimed. Usually that's the end of it--the object is available for immediate re-use. However some objects can have "finalizers" associated with them--functions that are called on reclaimable objects to effect any external cleanup actions. Finalizers are tricky business and it is best to avoid them. They can be invoked at unexpected times, or not at all--for example, they are not invoked on process exit. They don't help the garbage collector do its job; in fact, they are a hindrance. Furthermore, they perturb the garbage collector's internal accounting. The GC decides to scan the heap when it thinks that it is necessary, after some amount of allocation. Finalizable objects almost always represent an amount of allocation that is invisible to the garbage collector. The effect can be that the actual resource usage of a system with finalizable objects is higher than what the GC thinks it should be. All those caveats aside, some foreign object types will need finalizers. For example, if we had a foreign object type that wrapped file descriptors--and we aren't suggesting this, as Guile already has ports --then you might define the type like this: static SCM file_type; static void finalize_file (SCM file) { int fd = scm_foreign_object_signed_ref (file, 0); if (fd >= 0) { scm_foreign_object_signed_set_x (file, 0, -1); close (fd); } } static void init_file_type (void) { SCM name, slots; scm_t_struct_finalize finalizer; name = scm_from_utf8_symbol ("file"); slots = scm_list_1 (scm_from_utf8_symbol ("fd")); finalizer = finalize_file; image_type = scm_make_foreign_object_type (name, slots, finalizer); } static SCM make_file (int fd) { return scm_make_foreign_object_1 (file_type, (void *) fd); } Note that the finalizer may be invoked in ways and at times you might not expect. In particular, if the user's Guile is built with support for threads, the finalizer may be called from any thread that is running Guile. In Guile 2.0, finalizers are invoked via "asyncs", which interleaves them with running Scheme code; *note System asyncs::. In Guile 2.2 there will be a dedicated finalization thread, to ensure that the finalization doesn't run within the critical section of any other thread known to Guile. In either case, finalizers run concurrently with the main program, and so they need to be async-safe and thread-safe. If for some reason this is impossible, perhaps because you are embedding Guile in some application that is not itself thread-safe, you have a few options. One is to use guardians instead of finalizers, and arrange to pump the guardians for finalizable objects. *Note Guardians::, for more information. The other option is to disable automatic finalization entirely, and arrange to call 'scm_run_finalizers ()' at appropriate points. *Note Foreign Objects::, for more on these interfaces. Finalizers are allowed to allocate memory, access GC-managed memory, and in general can do anything any Guile user code can do. This was not the case in Guile 1.8, where finalizers were much more restricted. In particular, in Guile 2.0, finalizers can resuscitate objects. We do not recommend that users avail themselves of this possibility, however, as a resuscitated object can re-expose other finalizable objects that have been already finalized back to Scheme. These objects will not be finalized again, but they could cause use-after-free problems to code that handles objects of that particular foreign object type. To guard against this possibility, robust finalization routines should clear state from the foreign object, as in the above 'free_file' example. One final caveat. Foreign object finalizers are associated with the lifetime of a foreign object, not of its fields. If you access a field of a finalizable foreign object, and do not arrange to keep a reference on the foreign object itself, it could be that the outer foreign object gets finalized while you are working with its field. For example, consider a procedure to read some data from a file, from our example above. SCM read_bytes (SCM file, SCM n) { int fd; SCM buf; size_t len, pos; scm_assert_foreign_object_type (file_type, file); fd = scm_foreign_object_signed_ref (file, 0); if (fd < 0) scm_wrong_type_arg_msg ("read-bytes", SCM_ARG1, file, "open file"); len = scm_to_size_t (n); SCM buf = scm_c_make_bytevector (scm_to_size_t (n)); pos = 0; while (pos < len) { char *bytes = SCM_BYTEVECTOR_CONTENTS (buf); ssize_t count = read (fd, bytes + pos, len - pos); if (count < 0) scm_syserror ("read-bytes"); if (count == 0) break; pos += count; } scm_remember_upto_here_1 (file); return scm_values (scm_list_2 (buf, scm_from_size_t (pos))); } After the prelude, only the 'fd' value is used and the C compiler has no reason to keep the 'file' object around. If 'scm_c_make_bytevector' results in a garbage collection, 'file' might not be on the stack or anywhere else and could be finalized, leaving 'read' to read a closed (or, in a multi-threaded program, possibly re-used) file descriptor. The use of 'scm_remember_upto_here_1' prevents this, by creating a reference to 'file' after all data accesses. *Note Garbage Collection Functions::. 'scm_remember_upto_here_1' is only needed on finalizable objects, because garbage collection of other values is invisible to the program - it happens when needed, and is not observable. But if you can, save yourself the headache and build your program in such a way that it doesn't need finalization. 1.5 Foreign Objects and Scheme ------------------------------ It is also possible to create foreign objects and object types from Scheme, and to access fields of foreign objects from Scheme. For example, the file example from the last section could be equivalently expressed as: (define-module (my-file) #:use-module (system foreign-object) #:use-module ((oop goops) #:select (make)) #:export (make-file)) (define (finalize-file file) (let ((fd (struct-ref file 0))) (unless (< fd 0) (struct-set! file 0 -1) (close-fdes fd)))) (define (make-foreign-object-type ' '(fd) #:finalizer finalize-file)) (define (make-file fd) (make #:fd fd)) Here we see that the result of 'make-foreign-object-type', which is the equivalent of 'scm_make_foreign_object_type', is a struct vtable. *Note Vtables::, for more information. To instantiate the foreign object, which is really a Guile struct, we use 'make'. (We could have used 'make-struct/no-tail', but as an implementation detail, finalizers are attached in the 'initialize' method called by 'make'). To access the fields, we use 'struct-ref' and 'struct-set!'. *Note Structure Basics::. There is a convenience syntax, 'define-foreign-object-type', that defines a type along with a constructor, and getters for the fields. An appropriate invocation of 'define-foreign-object-type' for the file object type could look like this: (use-modules (system foreign-object)) (define-foreign-object-type make-file (fd) #:finalizer finalize-file) This defines the '' type with one field, a 'make-file' constructor, and a getter for the 'fd' field, bound to 'fd'. Foreign object types are not only vtables but are actually GOOPS classes, as hinted at above. *Note GOOPS::, for more on Guile's object-oriented programming system. Thus one can define print and equality methods using GOOPS: (use-modules (oop goops)) (define-method (write (file ) port) ;; Assuming existence of the `fd' getter (format port "#< ~a>" (fd file))) (define-method (equal? (a ) (b )) (eqv? (fd a) (fd b))) One can even sub-class foreign types. (define-class () (name #:init-keyword #:name #:init-value #f #:accessor name)) The question arises of how to construct these values, given that 'make-file' returns a plain old '' object. It turns out that you can use the GOOPS construction interface, where every field of the foreign object has an associated initialization keyword argument. (define* (my-open-file name #:optional (flags O_RDONLY)) (make #:fd (open-fdes name flags) #:name name)) (define-method (write (file ) port) (format port "#< ~s ~a>" (name file) (fd file))) *Note Foreign Objects::, for full documentation on the Scheme interface to foreign objects. *Note GOOPS::, for more on GOOPS. As a final note, you might wonder how this system supports encapsulation of sensitive values. First, we have to recognize that some facilities are essentially unsafe and have global scope. For example, in C, the integrity and confidentiality of a part of a program is at the mercy of every other part of that program - because any part of the program can read and write anything in its address space. At the same time, principled access to structured data is organized in C on lexical boundaries; if you don't expose accessors for your object, you trust other parts of the program not to work around that barrier. The situation is not dissimilar in Scheme. Although Scheme's unsafe constructs are fewer in number than in C, they do exist. The '(system foreign)' module can be used to violate confidentiality and integrity, and shouldn't be exposed to untrusted code. Although 'struct-ref' and 'struct-set!' are less unsafe, they still have a cross-cutting capability of drilling through abstractions. Performing a 'struct-set!' on a foreign object slot could cause unsafe foreign code to crash. Ultimately, structures in Scheme are capabilities for abstraction, and not abstractions themselves. That leaves us with the lexical capabilities, like constructors and accessors. Here is where encapsulation lies: the practical degree to which the innards of your foreign objects are exposed is the degree to which their accessors are lexically available in user code. If you want to allow users to reference fields of your foreign object, provide them with a getter. Otherwise you should assume that the only access to your object may come from your code, which has the relevant authority, or via code with access to cross-cutting 'struct-ref' and such, which also has the cross-cutting authority. --=-=-= Content-Type: text/plain -- http://wingolog.org/ --=-=-=--