On 02/15/2015 10:29 AM, Eli Zaretskii wrote: >> Date: Sun, 15 Feb 2015 10:01:28 -0800 >> From: Daniel Colascione >> CC: stephen_leake@stephe-leake.org, emacs-devel@gnu.org >> >>> Would you like to propose a list of functions that should be in that >>> table? I don't mean function names, I mean a short description of >>> each one of them. >> >> That depends on the supported module use cases, which are being >> discussed on other branches of the thread. > > You can easily look at the modules already on the branch, I think they > will give enough data points to at least start with this. Here's a broad outline of what I have in mind. We want an ABI powerful enough to let C modules interact with Emacs, but decoupled enough to let the Emacs core evolve independently. The best model we have for an interface like that is JNI. See the specification at [1] for comparison. The JNI designers carefully considered issues like error propagation, GC integration, thread safety, type coercion, and so on, so let's depart from the JNI model only where there's a good reason to do so. We don't need all of JNI's complexity, but let's try to provide a simple subset with an option to extend it later. First, let's define a JNI-like entry point. Call it emacs_module_init: struct emacs_runtime; extern int emacs_module_init(struct emacs_runtime* ert); When Emacs loads a module, it uses dlsym (or the platform equivalent) to find this routine and call it. If it returns 0, the module loaded successfully; otherwise, we report an error to the caller. That's the limit of Emacs integration with the system dynamic loader. ERT provides functions the module can use to do everything else. We'll declare struct emacs_runtime ABI-stable; we can add fields to the end of the struct without breaking binary compatibility. Runtime environment ------------------- Let's define the emacs_runtime structure like this: struct emacs_runtime { size_t size; struct emacs_env (*get_environment)(struct emacs_runtime* ert); }; The `size' member tells modules how long the emacs_runtime structure is. (It's better to use size than an explicit version field: this way, .size = sizeof(struct emacs_runtime) is always correct.) Modules use `get_environment' to retrieve a pointer to an `emacs_env' structure that lets them do interesting things; Emacs also supplied module-registered callbacks with `emacs_env' pointers on entry. Thread-local environments ------------------------- The `get_environment' member lets us do anything else interesting. As in Java, environments are thread-local. We only support one thread for the moment, so this constraint is easy to enforce. (Just abort if we use an emacs_env off the main thread.) Now we can define emacs_env like this. (Ignore the forward definitions for the moment.) typedef struct emacs_env_25 emacs_env; typedef struct emacs_value_tag* emacs_value; typedef emacs_value (*emacs_subr)( emacs_env* env, int nargs, emacs_value args[]); struct emacs_env { size_t size /* struct size */; emacs_value (*make_global_reference)( emacs_env* env, emacs_value any_reference); void (*free_global_reference)( emacs_env* env, emacs_value global_reference); bool (*error_check)(emacs_env* env); void (*clear_error)(emacs_env* env); bool (*get_error)( emacs_env* env, emacs_value* error_symbol_out, emacs_value* error_data_out); void (*signal_error)( emacs_env* env, emacs_value error_symbol, emacs_value error_data); emacs_value (*make_function)( emacs_env* env, int min_arity, int max_arity, emacs_subr function); emacs_value (*funcall)( emacs_env* env, emacs_value function, int nargs, emacs_value args[]); emacs_value (*intern)( emacs_env* env, const char* symbol_name); emacs_value (*type_of)( emacs_env* env, emacs_value value); int64_t (*fixnum_to_int)( emacs_env* env, emacs_value value); emacs_value (*make_fixnum)( emacs_env* env, int64_t value); double (*float_to_c_double)( emacs_env* env, emacs_value value); emacs_value (*make_float)( emacs_env* env, double value); bool (*copy_string_contents)( emacs_env* env, emacs_value value, char* buffer, size_* length_inout); emacs_value (*make_string)( emacs_env* env, const char* contents); }; That's a pretty bare-boned interface. If we want, we can add more members for efficiency and functionality reasons. (Maybe car and cdr.) emacs_env size -------------- The first member in emacs_env is `size': that's just the size the emacs_env structure provided by emacs to a module. In emacs.h, we'd have something like this: struct emacs_env_25 { ... }; struct emacs_env_26 { ... }; ...; typedef struct emacs_env_26 emacs_env; This way, modules will use the latest version by default, but can check for a lesser version with something like this: if (env->size < sizeof(struct emacs_env_26)) { avoid_thing_that_works_only_in_emacs_26(); } Memory management ----------------- Let's steal from JNI some more. emacs_value (*make_global_reference)( emacs_env* env, emacs_value any_reference); void (*free_global_reference)( emacs_env* env, emacs_value global_reference); We'll represent all Lisp values as an opaque pointer typedef emacs_value. Each emacs_value is either a local or a global reference. Local references are valid only on the current thread and only while the module function Emacs called is on the stack --- think GCPRO. Global references are valid indefinitely: each one is a GC root. Modules can use make_global_reference to allocate a global reference (i.e., a GC root) for any emacs_value; modules must then free these references explicitly. All routines (except make_global_reference) that return emacs_value values return local references. It's up to modules to register long-lived references explicitly. This way, normal module code remains pretty efficient (since it deals only with stack-allocated local references), but can retain arbitrary data. Also, we don't lock ourselves into conservative stack-scanning GC. Error handling -------------- We can't longjmp through arbitrary C code! We have to let modules treat errors more conventionally. Let's use the JNI approach: bool (*error_check)(emacs_env* env); void (*clear_error)(emacs_env* env); bool (*get_error)( emacs_env* env, emacs_value* error_symbol_out, emacs_value* error_data_out); void (*signal_error)( emacs_env* env, emacs_value error_symbol, emacs_value error_data); We'll give each thread (initially, our only thread) a pending-error information. When Emacs calls a module function, the current thread's pending-error flag will be clear. When that module returns to Emacs, if the thread's pending-error flag is set, Emacs signals the condition corresponding to the current thread's error information. When the module calls an Emacs routine that would ordinarily signal, Emacs catches the signal at the stack frame just before control flow would return to the module, sets the pending-error flag, and returns to the module normally. (Functions that would return a value on success instead return a dummy value, like 0 or NULL.) `error_check' just returns true iff an error is pending. `clear_error' removes pending error information. `get_error' retrieves the pending error information, without clearing it, and returns true. If no error is set, it returns false. signal_error sets the current thread's pending-error flag. To simplify the interface, we can treat `catch' and 'throw' as special kinds of error. Like JNI, we can just declare that it's illegal to call all but a few specially-marked functions (like global reference deregistration) with a pending error. Function registration --------------------- typedef emacs_value (*emacs_subr)( emacs_env* env, int nargs, emacs_value args[]); emacs_value (*make_function)( emacs_env* env, int min_arity, int max_arity, emacs_subr function); emacs_value (*funcall)( emacs_env* env, emacs_value function, int nargs, emacs_value args[]); emacs_value (*intern)( emacs_env* env, const char* symbol_name); Modules create function values using make_function; it works like lambda; max_arity == -1 indicates a varargs function. Modules can register functions in the global namespace by calling a Lisp-level function; we can also just provide a `defun' API, although strictly speaking, one would be superfluous. When Lisp calls a module-defined function object, Emacs calls the emacs_subr callback with which the function was defined. Modules can call Lisp functions using `funcall', which does the obvious thing. If Lisp signals or throws, `funcall' returns NULL. `intern' also does the obvious thing. Type coercion ------------- emacs_value is not very useful without letting C look inside specific types of value. emacs_value (*type_of)( emacs_env* env, emacs_value value); Like Lisp type-of: returns a symbol. int64_t (*fixnum_to_int)( emacs_env* env, emacs_value value); emacs_value (*make_fixnum)( emacs_env* env, int64_t value); These functions do the obvious thing. They signal error on type mismatch. We use int64_t to handle big-integer Emacs variants on 32-bit platforms. double (*float_to_c_double)( emacs_env* env, emacs_value value); emacs_value (*make_float)( emacs_env* env, double value); These functions do the obvious thing. They signal on type mismatch. bool (*copy_string_contents)( emacs_env* env, emacs_value value, char* buffer, size_* length_inout); emacs_value (*make_string)( emacs_env* env, const char* contents); These functions let C code access Lisp strings. I imagine we'll always produce and consume UTF-8. `copy_string_contents' copies into a caller-allocated buffer instead of returning a char* callers must free() --- this way, modules and the Emacs core don't need to share the same C runtime. We can deal with the buffer-length issue in a number of ways: here, we just accept the destination buffer size in *length_inout and write the total length of the string to *length_inout on normal return. We just truncate if we're given too short a buffer and don't signal an error; this way, callers can loop around and allocate a sufficiently large buffer for a string's contents. Other functionality ------------------- I think the interface above is enough for complete functionality in a module, but for efficiency, we might want to expose additional facilities, like access to a unibyte buffer's raw representation. Convenience library --------------- This kind of interface is adequate, but asking callers to env->intern() every Emacs facility they want to use is inconvenient. It'd be nice to provide a statically-linked convenience library, written in terms of the ABI-stable interface, that made it straightforward to set text properties, visit files, and so on. For example, we could provide a function like this: bool emacs_find_file(emacs_env* env, const char* filename) { emacs_value e_filename = env->make_string(env, filename); if(env->error_check(env)) return false; emacs_value e_find_file = env->intern(env, "find-file"); if(env->error_check(env)) return false; return env->funcall(env, e_find_file, &e_filename, 1) != NULL; } Questions --------- 1) Do we need JNI-style local variable frames and functions that release local references? 2) Maybe we want a separate, non-emacs_value type for global references? 3) How exactly do we represent catch/throw values? 4) Do we need to use int64_t for integers? 5) Do we need to provide direct access to string contents? [1] http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/jniTOC.html [2] Let's not support JNI's deprecated implicit registration system, where modules can register functions with the runtime by exporting symbols with certain magical names.