From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Mon, 16 Feb 2015 10:22:18 -0800 Message-ID: <54E2355A.9010900@dancol.org> References: <838ugdf251.fsf@gnu.org> <87bnl1vmqf.fsf@lifelogs.com> <87vbj8tow4.fsf@lifelogs.com> <87r3twtagf.fsf@lifelogs.com> <85siebl7ws.fsf@stephe-leake.org> <85a90ilwmm.fsf@stephe-leake.org> <83386a6f7z.fsf@gnu.org> <85h9upjz7v.fsf@stephe-leake.org> <83wq3k3kl4.fsf@gnu.org> <85bnkwil1c.fsf@stephe-leake.org> <83pp9cwky8.fsf@gnu.org> <85a90ggf2d.fsf@stephe-leake.org> <54E0A40F.5080603@dancol.org> <83sie7un20.fsf@gnu.org> <54E0D181.2080802@dancol.org> <83r3trulse.fsf@gnu.org> <54E0D7E0.305@[87.69.4.28]> <83h9unukbg.fsf@gnu.org> <54E0DEF8.7020901@dancol> <83egpruiyp.fsf@gnu.org> <54E0FF93.2000104@dancol.org> <833865vp4d.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="whw0Gmnt1O7qqxipVB83x9gTlJn2Thfsc" X-Trace: ger.gmane.org 1424111554 32660 80.91.229.3 (16 Feb 2015 18:32:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 16 Feb 2015 18:32:34 +0000 (UTC) Cc: stephen_leake@stephe-leake.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 16 19:32:26 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YNQT3-0005nQ-Me for ged-emacs-devel@m.gmane.org; Mon, 16 Feb 2015 19:32:25 +0100 Original-Received: from localhost ([::1]:41414 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YNQT3-00086L-3j for ged-emacs-devel@m.gmane.org; Mon, 16 Feb 2015 13:32:25 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32910) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YNQJY-0008Rr-Ei for emacs-devel@gnu.org; Mon, 16 Feb 2015 13:22:38 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YNQJT-0002xd-Mt for emacs-devel@gnu.org; Mon, 16 Feb 2015 13:22:36 -0500 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:43520) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YNQJT-0002vE-A8; Mon, 16 Feb 2015 13:22:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Type:In-Reply-To:References:Subject:CC:To:MIME-Version:From:Date:Message-ID; bh=FHF/xd3rUHfPI4HlLN7bC9wOO/CBwzxa1XmSZKV+3B8=; b=eqKh+D3rldUFRSOYK/9/JxXPhyUapm2eoYpZInfk//oVX7sYfXUGk9xnBKjq+zaRMPHrHbiYQlm1CDL9Lld/7xPuLLREctjrAj8KugcOF/3ycCOfVTSeG9i45f2f8dFluk8vQhCn9h1j5cp29wyjTK9jz9rUMPcCAjPQ95mNBsEBRPgA7mRADpYA2D+1YMO8A42MaMUm77n/Ro40kJ5BLAhL+9Aw1pvovm6l68nGJFHFOQEQ2TfL8mNciRzMP9XEyCb2kxuWpKDOPaUdbmLJLiT3hyQgAi0GzQ/XJzm1MzHx8Han8I/09oGoJwD5bc/CrrI6o6LOgavNhpC0jV5OMQ==; Original-Received: from c-73-221-38-18.hsd1.wa.comcast.net ([73.221.38.18] helo=[192.168.1.210]) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84) (envelope-from ) id 1YNQJL-0008QX-EQ; Mon, 16 Feb 2015 10:22:23 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 In-Reply-To: <833865vp4d.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183172 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --whw0Gmnt1O7qqxipVB83x9gTlJn2Thfsc Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 02/16/2015 07:43 AM, Eli Zaretskii wrote: >> Date: Sun, 15 Feb 2015 12:20:35 -0800 >> From: Daniel Colascione >> CC: stephen_leake@stephe-leake.org, emacs-devel@gnu.org >> >> Here's a broad outline of what I have in mind. >=20 > Thanks. I think the next step is for someone to try rewriting a > couple of modules on the branch in terms of this specification, and > see if something is missing. Volunteers are welcome. >=20 > Some comments on your write-up: >=20 >> When Emacs loads a module, it uses dlsym (or the platform equivalent) >> to find this routine and call it. If it returns 0, the module loaded >> successfully; otherwise, we report an error to the caller. >=20 > If this is all we need, then there's no need for libltdl, I think. >=20 >> struct emacs_runtime { >> size_t size; >> struct emacs_env (*get_environment)(struct emacs_runtime* ert); >> }; >> >> The `size' member tells modules how long the emacs_runtime structure >> is. (It's better to use size than an explicit version field: this way,= >> .size =3D sizeof(struct emacs_runtime) is always correct.) >=20 > This approach requires us to change the size each time we change the > layout, even if the change itself leaves the size intact. Using a > version field doesn't have this disadvantage. Emacs is allowed to pass a new version of the runtime and context structures to modules expecting old versions. That can only work if the structures are append-only, so `size' is adequate here. >> Thread-local environments >> ------------------------- >> >> The `get_environment' member lets us do anything else interesting. As >> in Java, environments are thread-local. We only support one thread for= >> the moment, so this constraint is easy to enforce. (Just abort if we >> use an emacs_env off the main thread.) >=20 > Is it wise to design for threads at this point? The only suggestion > on the table for "threads" is a far cry from real threads, and the > current Lisp interpreter is otherwise thread-unsafe. Won't the thread > infrastructure add a lot of cruft that will go unneeded for the > observable future? If it were a lot of cruft, I might agree with you, but it's not, so I'd rather make the design future-proof. >> We'll represent all Lisp values as an opaque pointer typedef >> emacs_value. Each emacs_value is either a local or a global >> reference. Local references are valid only on the current thread and >> only while the module function Emacs called is on the stack --- think >> GCPRO. Global references are valid indefinitely: each one is a GC >> root. >=20 > With massive use of calls to Lisp functions from modules (since we > don't provide them with direct access in C to internals of many > objects), how can we ensure GC doesn't strike while the function that > created an emacs_value is still on the callstack? We don't. Calls from modules to Lisp can certainly GC. That's why we have local references. > You say "we don't > lock ourselves into conservative stack-scanning GC", which I interpret > as saying you don't want to rely on stack scanning to avoid a > destructive GC in this case. But if we don't rely on that, where's > the guarantee that such emacs_value will survive GC? Emacs stores emacs_value values in a local reference table before handing them to module code. >> We'll represent all Lisp values as an opaque pointer typedef >> emacs_value. >=20 > This doesn't play well with --with-wide-int, where a value can be > wider than a pointer. I think we should instead go with intmax_t or > inptr_t, whichever is wider on the host. emacs_value objects are not literally Lisp_Object values. They're indirected through a reference table (either local or global) so that we can GC them. A pointer value can address all possible memory locations, so it's fine here. (In the wide-int case, the pointed-to table entries are wider than pointers, but modules don't need to know that.) >> Function registration >> --------------------- >> >> typedef emacs_value (*emacs_subr)( >> emacs_env* env, >> int nargs, >> emacs_value args[]); >> >> emacs_value (*make_function)( >> emacs_env* env, >> int min_arity, >> int max_arity, >> emacs_subr function); >=20 > What about the doc string? We'll set that at the lisp level. make_function is equivalent to lambda, not defun. We can either let modules call `defun' directly or provide a special Lisp-level function just for them. >> emacs_value (*funcall)( >> emacs_env* env, >> emacs_value function, >> int nargs, >> emacs_value args[]); >=20 > Shouldn't funcall use emacs_subr? Why? It works like Lisp funcall: you can give it a function object, a symbol, or a lambda expression. Remember that this facility lets an Emacs module call Lisp, not the other way around. >> Modules can register functions in the global namespace by calling a >> Lisp-level function >=20 > This is unclear, can you elaborate? What happens if a function is not > "registered"? what's its status then? It has the same status a Lisp function does after defun. >> When Lisp calls a module-defined function object, Emacs calls the >> emacs_subr callback with which the function was defined. >=20 > This is a change in the Lisp interpreter, I think. Why do we need > this? Why would it be? Implementation-wise, all we need is an Emacs-internal magic function for calling into modules; we can implement module-registered functions as Lisp-level lambdas that close over all the stuff we need to invoke the module code. >> If Lisp signals or throws, `funcall' returns NULL. >=20 > I suggest some other value or indication of that. NULL is a valid > return value, so usurping it for errors might be too harsh. No it isn't. Qnil is distinct from NULL in this model because emacs_value is not a Lisp_Object in disguise. Qnil is not special here. > Or maybe I don't understand how will Lisp functions return values to > the module, under your suggestion. Can you describe that? >=20 >> `intern' also does the obvious thing. >=20 > Do we need 'unintern' as well? Modules can call unintern through Lisp. >> emacs_value (*type_of)( >> emacs_env* env, >> emacs_value value); >> >> Like Lisp type-of: returns a symbol. >=20 > What is a "symbol", from the module's C code POV? It's an emacs_value. You can compare symbols (by calling `eq'), call them as functions, or use them as function arguments. That's sufficient. Come to think of it, though, we do need a C-level `eq'. > You show no > functions to access attributes of symbols, so it must be either one of > the other types, like an integer or a string, or a C primitive data > type, like a char * pointer. That's deliberate. Modules should be calling Lisp functions to do that wo= rk. >> int64_t (*fixnum_to_int)( >> emacs_env* env, >> emacs_value value); >> >> emacs_value (*make_fixnum)( >> emacs_env* env, >> int64_t value); >> >> These functions do the obvious thing. They signal error on type >> mismatch. We use int64_t to handle big-integer Emacs variants on >> 32-bit platforms. >=20 > The last bit means we will need a utility function to return the valid > range of integers, so that modules can be written to support 32-bit > and 64-bit use cases without raising errors. Modules can call (get 'most-positive-fixnum). A utility function that did that wouldn't hurt though. >> bool (*copy_string_contents)( >> emacs_env* env, >> emacs_value value, >> char* buffer, >> size_* length_inout); >> >> emacs_value (*make_string)( >> emacs_env* env, >> const char* contents); >> >> These functions let C code access Lisp strings. I imagine we'll >> always produce and consume UTF-8. >=20 > Strings in Emacs are of limited usability if you cannot encode and > decode them. So this needs to be part of supported functionality, I > think. >=20 > More generally, modules that would like to process buffer or string > text will have to be able to deal with Emacs's internal encoding of > text, which means macros and functions we use in the core. The > alternative of working only on UTF-8 encoded replica means we'd need > to encode and decode text across the module boundaries -- that's a lot > of consing. Is the Emacs internal encoding stable? I don't have a problem with allowing modules to access string contents: it just enlarges the interface we'll have to support forever. Is consing strings really a bottleneck in the use cases we have in mind? >> `copy_string_contents' copies into a caller-allocated buffer instead >> of returning a char* callers must free() --- this way, modules and the= >> Emacs core don't need to share the same C runtime. We can deal with >> the buffer-length issue in a number of ways: here, we just accept the >> destination buffer size in *length_inout and write the total length of= >> the string to *length_inout on normal return. We just truncate if >> we're given too short a buffer and don't signal an error; this way, >> callers can loop around and allocate a sufficiently large buffer for a= >> string's contents. >=20 > That's an annoyance, IMO; why not provide a function to return the > required size? Its implementation is trivial. Sure --- as part of the non-ABI convenience library. >> I think the interface above is enough for complete functionality in a >> module, but for efficiency, we might want to expose additional >> facilities, like access to a unibyte buffer's raw representation. >=20 > I can envision a few additional missing bits: >=20 > . direct access to buffer text (using buffer-substring means consing > a lot of strings) Is that really a problem for the use cases we want to support? > . creation of opaque objects that should be returned to Lisp (like > handles to objects managed by modules) Sure. That'll require a new vectorlike. > . not sure how will a module "provide" its feature It doesn't. You don't require a module: you load it. Most modules will come with a small Lisp wrapper, just like Python modules. > . some operations that must be efficient because they are typically > done in inner loops, like regexp matching and accessing syntax of > characters: doing that via Lisp will probably be horribly slow That seems reasonable. >> Convenience library >> --------------- >=20 > One thing that's inconvenient is the need to drag the environment > pointer through all the calls. Why exactly is that needed? Modules have no other way of accessing Emacs internals. Dragging one parameter through the system isn't that bad and lets us sanely track which module is doing what. >> bool >> emacs_find_file(emacs_env* env, const char* filename) >> { >> emacs_value e_filename =3D env->make_string(env, filename); >> if(env->error_check(env)) return false; >> emacs_value e_find_file =3D env->intern(env, "find-file"); >> if(env->error_check(env)) return false; >> return env->funcall(env, e_find_file, &e_filename, 1) !=3D NULL; >> } >=20 > This kind of code looks tedious to me, no matter if it's in Emacs or > in the module. Just an observation. We need explicit error handling. Within Emacs, we can use longjmp for error handling. In modules, we can't. > Also, the buffer returned by find-file when it returns normally is > lost here, isn't it? Sure. It's just a tiny example, and the returned buffer itself is still GCed. --whw0Gmnt1O7qqxipVB83x9gTlJn2Thfsc Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJU4jVaAAoJEN4WImmbpWBlPF8QAKv0uJpF1GX3jn1T0yNxWymP YURx6X8SalpPoBOMCXohrUEQO0OvdUAweiQzbWtE/vyGN92evPf6ACvGFAyp4NLN lAmpbu6eCMutmVc30w3mrvKTR95oEqpW9eZgHcVHMPmYEuTikCbazv8WPGraWMAw p9mqMX3Bzc+yqsVZNlt9Db7GJ+Dd/2WvBQFQBOR2OiJQKzo9BcTWl5Tx62eD6cYk JrDWla6S74q0t2RLbUxOA2jKSp8Z5Uvk0Y9UfnDHbRr3vxdKnfwoUOIIbMZpBwXC Tr/a+jS2NIpt7WME1GTKA6ako06XirCFa33b4VyjPHSxxyzVmQggfrhBlC/AGAIR 0iT1QsPW9t6fH8oqFVozvuThXJ7MAAHESI8OymPoN/QVLHFSy18vTzvt2PIMBJZv 5o7+sSFSRiblQl0t6BiUHVkD4IUO5x7yP865apBKDjW83yYS9LnNSmtlQyOHQ5z5 fkma+UqSptCikrU0VJP8cN/w6l5GY7OXLQ6SvKc6IM++yT3v9taWl16pD0+avfRI 6gyjiWKrrkzW14TmsvXrwKzFURn2cm5L51fx/KrGGVUsHH3s1NC0J2msg4HiD6HA Mrbi97/ObOxvyDkejT8qCX3h+C5GvDA3TucMU/U/D9HXVqFVC7djm8mw+mQ12uxT QBSL4AdD08d04Ize/62R =vaw2 -----END PGP SIGNATURE----- --whw0Gmnt1O7qqxipVB83x9gTlJn2Thfsc--