From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic loading progress
Date: Thu, 19 Nov 2015 22:41:09 +0000
Message-ID: <CAArVCkTwVbA58_wfj7O-Et83M8YJ9jfpCKhYn466BYO8T2cG0A@mail.gmail.com>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com>
	<CA+5B0FPp9nYEmoyDLrutJpcOZBtpV9kxx7LdPqrsj26rnj11qA@mail.gmail.com>
	<CAArVCkS515CVbS1UfavFGAq0dGO=e_mGftMbhF_eBw3SSu3Xjg@mail.gmail.com>
	<877flswse5.fsf@lifelogs.com>
	<CAArVCkT0M8o4MDP1RaP-r9JqumoQaMbhANRrycSEyyCj+mqUcA@mail.gmail.com>
	<8737wgw7kf.fsf@lifelogs.com>
	<CA+5B0FOGrn01XZzKJvXdWLPL62ONUzoEBfQRwLiKqLmd6Ta3RA@mail.gmail.com>
	<87io5bv1it.fsf@lifelogs.com>
	<CA+5B0FOp8Ub1+V_2G4CC1r2aG1hLKmZdSic59MfOy=9QoovSRQ@mail.gmail.com>
	<87egfzuwca.fsf@lifelogs.com>
	<CAArVCkSEHxSd3X2PnEvRJk5n1wOR0y9neU7AxGYEHSqKRG+y3Q@mail.gmail.com>
	<876118u6f2.fsf@lifelogs.com>
	<CA+5B0FPz-vo+Y=38=21jRQuEHANzFG_cf3tPDiwEbK2TO4+JdA@mail.gmail.com>
	<CA+5B0FNW48d3S5CJfxHK9HHVHPmuYqaT3K9tn5MVTgv_qas5Rw@mail.gmail.com>
	<ryhmvud820v.fsf@dod.no>
	<CA+5B0FMU1Ry6mRSinyV5Ar8DaL4VciEUEbTe1NcXZUQ2-4y4TA@mail.gmail.com>
	<8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org>
	<878u5upw7o.fsf@lifelogs.com>
	<83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a114450c81561190524ec75f0
X-Trace: ger.gmane.org 1447972910 16890 80.91.229.3 (19 Nov 2015 22:41:50 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 19 Nov 2015 22:41:50 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>, Ted Zlatanov <tzz@lifelogs.com>, 
	=?UTF-8?Q?Aur=C3=A9lien_Aptel?= <aurelien.aptel+emacs@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 23:41:36 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZzXtW-00016k-6c
	for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2015 23:41:34 +0100
Original-Received: from localhost ([::1]:44357 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1ZzXtV-0003G8-GC
	for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2015 17:41:33 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53910)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1ZzXtP-0003G0-CD
	for emacs-devel@gnu.org; Thu, 19 Nov 2015 17:41:30 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1ZzXtL-0002Ca-US
	for emacs-devel@gnu.org; Thu, 19 Nov 2015 17:41:27 -0500
Original-Received: from mail-wm0-x22b.google.com ([2a00:1450:400c:c09::22b]:34733)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>)
	id 1ZzXtH-00023A-Cz; Thu, 19 Nov 2015 17:41:19 -0500
Original-Received: by wmvv187 with SMTP id v187so47930775wmv.1;
	Thu, 19 Nov 2015 14:41:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc:content-type;
	bh=uRcNAO/zppDDwT/Wghss0CY1VGSUVJYt8NCtVkFaGDo=;
	b=dFxLdfq6mRqOWIFS4ydTQC9EJNFEQwojUjZIssOc7CqfVs/91eQRT9eX3cdOKEeRac
	TFSi38jsTyEdOIZwgOwCQ82xql9gZtz7i0y7AmVdum61Ct2qEO9jPklO4uBqtl1RCzhH
	4tauCmeugD7TrJXqZN1q0WfIB7Rkqp8o5oXkaaeacPS0tVI5LAHNyh64kq3ydtQdTP19
	lCuxd+2M0ZvcpIWsRg720dL1Wb4aFB/xYhP3N0slv4qIIOmBhTvnySlwuzWKxTt/B90M
	9qrEU379pNJ1e21Vg0bvZFQMKSjwc7uXSqL4RCm/bHraIX99PUDhdYZbqw+QJB1j3NeR
	I0EQ==
X-Received: by 10.28.7.8 with SMTP id 8mr192395wmh.45.1447972878768; Thu, 19
	Nov 2015 14:41:18 -0800 (PST)
In-Reply-To: <83y4du80xo.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c09::22b
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:194826
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/194826>

--001a114450c81561190524ec75f0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks for the thorough review; I'll prepare patches for all of these
issues.

Eli Zaretskii <eliz@gnu.org> schrieb am Do., 19. Nov. 2015 um 16:28 Uhr:

> Thanks for making this possible.
>
> Please allow me a few questions/comments about the dynamic modules
> code, based on some initial reading:
>
> Why module.c, but confusingly emacs_module.h?  Are there any reasons
> not to use the same base name, like we do for other C sources?
>
>
emacs_module.h is intended to be included by module authors. Therefore its
name needs to be globally unique, which in practice means it needs to start
with 'emacs_'.
module.c could be renamed accordingly, if you prefer that. However,
Aur=C3=A9lien picked the simple name because it's not required to be global=
ly
unique.


> This comment in module.c:
>
>   /* If checking is enabled, abort if the current thread is not the
>      Emacs main thread. */
>   static void check_main_thread (void);
>
> confused me, because a later comment says:
>
>   void module_init (void)
>   {
>     /* It is not guaranteed that dynamic initializers run in the main
> thread,
>        therefore we detect the main thread here. */
>
> If dynamic initializers might run in a thread different from the Emacs
> main thread, then the code in module_init will record that other
> thread, not the Emacs main thread, right?
>

No, because module_init is guaranteed to be called from the main thread
because main calls it explicitly.


>
> Also, another comment:
>
>   /* On Windows, we store both a handle to the main thread and the
>      thread ID because the latter can be reused when a thread
>      terminates. */
>
> seems to imply that 'main_thread' here is not the Emacs's main thread,
> because that thread never terminates as long as the Emacs session is
> alive.
>
> So what's the deal here? what does this thread checking supposed to
> detect?
>

This guards against the Emacs main thread having exited while module code
in some other thread is still running and attempting to call Emacs
functions. This is undefined behavior, but we included an explicit check if
checking is enabled because that case is somewhat subtle.


>
> In this code from module.c:
>
>       Lisp_Object value =3D HASH_VALUE (h, i);
>       eassert (NATNUMP (value));
>       const EMACS_UINT refcount =3D XFASTINT (value);
>       if (refcount >=3D MOST_POSITIVE_FIXNUM)
>         {
>           module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil);
>           return NULL;
>         }
>
> how can the 'if' clause ever be true?  refcount is an Emacs integer,
> as you have just verified, no?  And if this somehow can happen, then
> why isn't there a similar check in the other functions?
>

refcount can be MOST_POSITIVE_FIXNUM because that's an inclusive bound.
It's important to check that case because later refcount is incremented by
one, and if it's equal to MOST_POSITIVE_FIXNUM it would be outside the
allowed range afterwards. No other function increments numbers, thus no
other functions need this check.


>
> Re this fragment from module.c:
>
>   Lisp_Object ret =3D list4 (Qlambda,
>                            list2 (Qand_rest, Qargs),
>                            documentation ? build_string (documentation) :
> Qnil,
>                            list3 (module_call_func,
>                                   envobj,
>                                   Qargs));
>
> Thou shalt not use build_string, except when you _know_ the argument
> will always be a pure-ASCII string.  Practically, this means the
> argument must be a constant ASCII string.  See these messages (and the
> preceding discussion, if you are interested) for the gory details:
>
>   http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00955.html
>   http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00976.html
>   http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00979.html
>
> The above should call make_multibyte_string instead.
>

We had a discussion about encodings in
https://github.com/aaptel/emacs-dynamic-module/issues/37. Sorry that this
didn't get resolved earlier; it's an important point. My suggestion would
be to always mandate specifying an encoding whenever a char* is passed, and
limit that to two or three functions dealing with creating strings and
accessing string contents. Would that address your concerns?


>
> Again in module.c:
>
>   /*
>    * Emacs internal encoding is more-or-less UTF8, let's assume utf8
>    * encoded emacs string are the same byte size.
>    */
>
>   if (!buffer || length =3D=3D 0 || *length-1 < raw_size)
>     {
>       *length =3D raw_size + 1;
>       return false;
>     }
>
> I don't understand why you need this assumption.  You are going to
> encode the string in a moment, so why not test 'length' against the
> size you actually obtain there?  (The above test will misfire when the
> original string includes characters above the Unicode codespace, which
> require 5 bytes internally, but their encoding maps them to Unicode
> codepoints which cannot take more than 4 bytes.  So you might reject
> perfectly valid calls.)
>
> In module_make_string you have:
>
>   /* Assume STR is utf8 encoded */
>   return lisp_to_value (env, make_string (str, length));
>
> The discussion I pointed to above concluded that <quote>make_string is
> a bug</quote>.  So please use make_multibyte_string here instead.
>

See above; my suggestion would be to change the string handling code by
limiting encoding and decoding to a small set of functions where the
encoding would have to be specified explicitly.


>
> It looks like XUSER_PTR is used both as an lvalue and an rvalue.  This
> is different from any other object, where we have separate Xfoo and
> XSETfoo macros.  Suggest to follow suit.
>

Agreed.


>
>   static void module_set_user_finalizer (emacs_env *env,
>                                              emacs_value uptr,
>                                              emacs_finalizer_function fin=
)
>   {
>     check_main_thread ();
>     eassert (module_non_local_exit_check (env) =3D=3D
> emacs_funcall_exit_return);
>     const Lisp_Object lisp =3D value_to_lisp (uptr);
>     if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, lisp);
>     XUSER_PTR (lisp)->finalizer =3D fin; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<=
<
>   }
>
> No validity checks on 'fin'?
>

How should it be validated? In C an arbitrary (invalid) pointer could be
passed. I think we just have to accept that this is UB.


>
> In module_vec_get:
>
>   /* Prevent error-prone comparison between types of different signedness=
.
> */
>   const size_t size =3D ASIZE (lvec);
>   eassert (size >=3D 0);
>
> How can the assertion be ever violated?
>

Yeah, that's a bug. I probably meant size to be declared as ptrdiff_t,
which is what ASIZE returns.


>
> In module-load:
>
>   CHECK_STRING (file);
>   handle =3D dynlib_open (SDATA (file));
>
> Every Lisp primitive that accepts file arguments _must_ call
> expand-file-name on the file name, before using it.  Otherwise,
> relative file names will produce subtle and hard-to-debug problems
> when the Lisp program calling them involves changing the current
> directory of the Emacs process.
>
> The other mandatory thing absent from the above is ENCODE_FILE.  You
> cannot pass unencoded file names to C runtime functions.
>

OK, will send a patch.


>
>   struct {
>     struct emacs_runtime pub;
>     struct emacs_runtime_private priv;
>   } runtime =3D {
>     .pub =3D {
>       .size =3D sizeof runtime.pub,
>       .get_environment =3D module_get_environment,
>       .private_members =3D &runtime.priv
>     }
>   };
>
> Is this portable enough?
>

According to http://en.cppreference.com/w/c/language/struct_initialization
 and https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html designated
initializers are part of C99, which is required for building Emacs.


>
>   int r =3D module_init (&runtime.pub);
>
> I think calling this function "module_init" is sub-optimal: you have a
> void function by the same name in this source file.  How about
> renaming it to something else?
>

Agree


>
> Also, it seems that once a module is loaded, it cannot be unloaded
> (i.e., no unload-feature equivalent is provided).  Is that on purpose?
> An Emacs process can live for a very long time, so keeping all of the
> modules open in it at all times is not necessarily TRT.  E.g., how do
> I update to a newer version of a module?
>

Unloading is important (
https://github.com/aaptel/emacs-dynamic-module/issues/36), but for now we
decided to delay it to a later version because we expect much discussion
about the precise semantics.


>
> Shouldn't module-init call dynlib_close before it returns?  Otherwise
> we are leaking descriptors here, no?
>

My impression from reading dlclose(3) is that modules shouldn't be unloaded
while they are still used.


>
> In module-call:
>
>   const EMACS_INT len =3D XINT (Flength (arglist));
>   eassert (len >=3D 0);
>   if (len > MOST_POSITIVE_FIXNUM)
>     xsignal0 (Qoverflow_error);
>
> How can the 'if' clause ever be true?  XINT by definition cannot
> produce anything but a valid EMACS_INT, can it?
>

True. Not sure what I was thinking. (Could be replaced by an eassert, to
document the assumption.)


>
>   if (len > INT_MAX || len < envptr->min_arity || (envptr->max_arity >=3D=
 0
> && len > envptr->max_arity))
>     xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (envptr),
> make_number (len));
>
> Why the test against INT_MAX?  EMACS_INT can legitimately be a 64-bit
> data type with values far exceeding INT_MAX.  Why impose this
> limitation here?
>

Because the nargs argument in the module interface is an int.
If you think functions with more than INT_MAX arguments should be
supported, the type for nargs should be changed to int64.


>
> allocate_emacs_value calls malloc; shouldn't it call xmalloc instead,
> or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol?
>

If xmalloc is called, then we need to make sure that no signals (longjmps)
can escape to module code. If appropriate setjmps are in place that should
be doable, but I need to check whether there are edge cases.


>
> In module_format_fun_env, you produce a unibyte string, and then use
> that in calls to functions like xsignal1, which expect Lisp strings in
> their internal multibyte representation.  You should instead decode
> the unibyte string (using UTF-8) before you return it.
>

OK, will send a patch.


>
> Btw, I wonder whether we should provide a more friendly capabilities
> for retrieving the function name and its module.  dladdr is not
> portable enough, and we have all the information at our fingertips
> in the module's init function, we just need to keep it instead of
> throwing it away.  I envision debugging module-related problems will
> not be a very rare situation, so we need any help we can get.  WDYT?
>

Hmm, I don't know whether we have access to the function name without using
dladdr. The user just passes a pointer to the module, not its name.


>
> In syms_of_module:
>
>   /* Unintern `module-environments' because it is only used
>      internally. */
>   Funintern (Qmodule_environments, Qnil);
>
> What if some Lisp defines a (interned) symbol by that name?  Won't
> they clash?
>

I followed the lead of internal-interpreter-environment in eval.c, which
uses the same pattern.


>
> The Windows-specific parts of dynlib.c need work, e.g. you cannot pass
> UTF-8 encoded file names to Windows APIs.  And there are some other
> issues.  I'll take care of that.
>

I can also do that, I should be able to set up a build environment in a VM.


>
> About the tests: The Makefile in mod-test is Unix-specific: it uses a
> literal .so extension.  I also think the Python script should be
> rewritten in Emacs Lisp, so that Python installation is not required.
> Finally, all of the module tests and associated files should be moved
> into test/, preferably even test/automated/ and made part of the "make
> check" run.
>
>
Yes, tracked in https://github.com/aaptel/emacs-dynamic-module/issues/34

--001a114450c81561190524ec75f0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the thorough review; I&#39;ll prepare patches f=
or all of these issues.<br><br><div class=3D"gmail_quote"><div dir=3D"ltr">=
Eli Zaretskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schr=
ieb am Do., 19. Nov. 2015 um 16:28=C2=A0Uhr:<br></div><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">Thanks for making this possible.<br>
<br>
Please allow me a few questions/comments about the dynamic modules<br>
code, based on some initial reading:<br>
<br>
Why module.c, but confusingly emacs_module.h?=C2=A0 Are there any reasons<b=
r>
not to use the same base name, like we do for other C sources?<br>
<br></blockquote><div><br></div><div>emacs_module.h is intended to be inclu=
ded by module authors. Therefore its name needs to be globally unique, whic=
h in practice means it needs to start with &#39;emacs_&#39;.</div><div>modu=
le.c could be renamed accordingly, if you prefer that. However, Aur=C3=A9li=
en picked the simple name because it&#39;s not required to be globally uniq=
ue.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
This comment in module.c:<br>
<br>
=C2=A0 /* If checking is enabled, abort if the current thread is not the<br=
>
=C2=A0 =C2=A0 =C2=A0Emacs main thread. */<br>
=C2=A0 static void check_main_thread (void);<br>
<br>
confused me, because a later comment says:<br>
<br>
=C2=A0 void module_init (void)<br>
=C2=A0 {<br>
=C2=A0 =C2=A0 /* It is not guaranteed that dynamic initializers run in the =
main thread,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0therefore we detect the main thread here. */<br>
<br>
If dynamic initializers might run in a thread different from the Emacs<br>
main thread, then the code in module_init will record that other<br>
thread, not the Emacs main thread, right?<br></blockquote><div><br></div><d=
iv>No, because module_init is guaranteed to be called from the main thread =
because main calls it explicitly.</div><div>=C2=A0</div><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
<br>
Also, another comment:<br>
<br>
=C2=A0 /* On Windows, we store both a handle to the main thread and the<br>
=C2=A0 =C2=A0 =C2=A0thread ID because the latter can be reused when a threa=
d<br>
=C2=A0 =C2=A0 =C2=A0terminates. */<br>
<br>
seems to imply that &#39;main_thread&#39; here is not the Emacs&#39;s main =
thread,<br>
because that thread never terminates as long as the Emacs session is<br>
alive.<br>
<br>
So what&#39;s the deal here? what does this thread checking supposed to<br>
detect?<br></blockquote><div><br></div><div>This guards against the Emacs m=
ain thread having exited while module code in some other thread is still ru=
nning and attempting to call Emacs functions. This is undefined behavior, b=
ut we included an explicit check if checking is enabled because that case i=
s somewhat subtle.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In this code from module.c:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 Lisp_Object value =3D HASH_VALUE (h, i);<br>
=C2=A0 =C2=A0 =C2=A0 eassert (NATNUMP (value));<br>
=C2=A0 =C2=A0 =C2=A0 const EMACS_UINT refcount =3D XFASTINT (value);<br>
=C2=A0 =C2=A0 =C2=A0 if (refcount &gt;=3D MOST_POSITIVE_FIXNUM)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 {<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 module_non_local_exit_signal_1 (env, Qov=
erflow_error, Qnil);<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return NULL;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>
<br>
how can the &#39;if&#39; clause ever be true?=C2=A0 refcount is an Emacs in=
teger,<br>
as you have just verified, no?=C2=A0 And if this somehow can happen, then<b=
r>
why isn&#39;t there a similar check in the other functions?<br></blockquote=
><div><br></div><div>refcount can be MOST_POSITIVE_FIXNUM because that&#39;=
s an inclusive bound. It&#39;s important to check that case because later r=
efcount is incremented by one, and if it&#39;s equal to MOST_POSITIVE_FIXNU=
M it would be outside the allowed range afterwards. No other function incre=
ments numbers, thus no other functions need this check.</div><div>=C2=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex">
<br>
Re this fragment from module.c:<br>
<br>
=C2=A0 Lisp_Object ret =3D list4 (Qlambda,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0list2 (Qand_rest, Qargs),<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0documentation ? build_string (documentation) : Qnil=
,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0list3 (module_call_func,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 envobj,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Qargs));<br>
<br>
Thou shalt not use build_string, except when you _know_ the argument<br>
will always be a pure-ASCII string.=C2=A0 Practically, this means the<br>
argument must be a constant ASCII string.=C2=A0 See these messages (and the=
<br>
preceding discussion, if you are interested) for the gory details:<br>
<br>
=C2=A0 <a href=3D"http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/m=
sg00955.html" rel=3D"noreferrer" target=3D"_blank">http://lists.gnu.org/arc=
hive/html/bug-gnu-emacs/2013-10/msg00955.html</a><br>
=C2=A0 <a href=3D"http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/m=
sg00976.html" rel=3D"noreferrer" target=3D"_blank">http://lists.gnu.org/arc=
hive/html/bug-gnu-emacs/2013-10/msg00976.html</a><br>
=C2=A0 <a href=3D"http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/m=
sg00979.html" rel=3D"noreferrer" target=3D"_blank">http://lists.gnu.org/arc=
hive/html/bug-gnu-emacs/2013-10/msg00979.html</a><br>
<br>
The above should call make_multibyte_string instead.<br></blockquote><div><=
br></div><div>We had a discussion about encodings in=C2=A0<a href=3D"https:=
//github.com/aaptel/emacs-dynamic-module/issues/37">https://github.com/aapt=
el/emacs-dynamic-module/issues/37</a>. Sorry that this didn&#39;t get resol=
ved earlier; it&#39;s an important point. My suggestion would be to always =
mandate specifying an encoding whenever a char* is passed, and limit that t=
o two or three functions dealing with creating strings and accessing string=
 contents. Would that address your concerns?</div><div>=C2=A0</div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc s=
olid;padding-left:1ex">
<br>
Again in module.c:<br>
<br>
=C2=A0 /*<br>
=C2=A0 =C2=A0* Emacs internal encoding is more-or-less UTF8, let&#39;s assu=
me utf8<br>
=C2=A0 =C2=A0* encoded emacs string are the same byte size.<br>
=C2=A0 =C2=A0*/<br>
<br>
=C2=A0 if (!buffer || length =3D=3D 0 || *length-1 &lt; raw_size)<br>
=C2=A0 =C2=A0 {<br>
=C2=A0 =C2=A0 =C2=A0 *length =3D raw_size + 1;<br>
=C2=A0 =C2=A0 =C2=A0 return false;<br>
=C2=A0 =C2=A0 }<br>
<br>
I don&#39;t understand why you need this assumption.=C2=A0 You are going to=
<br>
encode the string in a moment, so why not test &#39;length&#39; against the=
<br>
size you actually obtain there?=C2=A0 (The above test will misfire when the=
<br>
original string includes characters above the Unicode codespace, which<br>
require 5 bytes internally, but their encoding maps them to Unicode<br>
codepoints which cannot take more than 4 bytes.=C2=A0 So you might reject<b=
r>
perfectly valid calls.)<br>
<br>
In module_make_string you have:<br>
<br>
=C2=A0 /* Assume STR is utf8 encoded */<br>
=C2=A0 return lisp_to_value (env, make_string (str, length));<br>
<br>
The discussion I pointed to above concluded that &lt;quote&gt;make_string i=
s<br>
a bug&lt;/quote&gt;.=C2=A0 So please use make_multibyte_string here instead=
.<br></blockquote><div><br></div><div>See above; my suggestion would be to =
change the string handling code by limiting encoding and decoding to a smal=
l set of functions where the encoding would have to be specified explicitly=
.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
It looks like XUSER_PTR is used both as an lvalue and an rvalue.=C2=A0 This=
<br>
is different from any other object, where we have separate Xfoo and<br>
XSETfoo macros.=C2=A0 Suggest to follow suit.<br></blockquote><div><br></di=
v><div>Agreed.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
=C2=A0 static void module_set_user_finalizer (emacs_env *env,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0emacs_value uptr,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0emacs_finalizer_function fin)<br>
=C2=A0 {<br>
=C2=A0 =C2=A0 check_main_thread ();<br>
=C2=A0 =C2=A0 eassert (module_non_local_exit_check (env) =3D=3D emacs_funca=
ll_exit_return);<br>
=C2=A0 =C2=A0 const Lisp_Object lisp =3D value_to_lisp (uptr);<br>
=C2=A0 =C2=A0 if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, li=
sp);<br>
=C2=A0 =C2=A0 XUSER_PTR (lisp)-&gt;finalizer =3D fin; &lt;&lt;&lt;&lt;&lt;&=
lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;=
&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;<br>
=C2=A0 }<br>
<br>
No validity checks on &#39;fin&#39;?<br></blockquote><div><br></div><div>Ho=
w should it be validated? In C an arbitrary (invalid) pointer could be pass=
ed. I think we just have to accept that this is UB.</div><div>=C2=A0</div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">
<br>
In module_vec_get:<br>
<br>
=C2=A0 /* Prevent error-prone comparison between types of different signedn=
ess. */<br>
=C2=A0 const size_t size =3D ASIZE (lvec);<br>
=C2=A0 eassert (size &gt;=3D 0);<br>
<br>
How can the assertion be ever violated?<br></blockquote><div><br></div><div=
>Yeah, that&#39;s a bug. I probably meant size to be declared as ptrdiff_t,=
 which is what ASIZE returns.</div><div>=C2=A0</div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">
<br>
In module-load:<br>
<br>
=C2=A0 CHECK_STRING (file);<br>
=C2=A0 handle =3D dynlib_open (SDATA (file));<br>
<br>
Every Lisp primitive that accepts file arguments _must_ call<br>
expand-file-name on the file name, before using it.=C2=A0 Otherwise,<br>
relative file names will produce subtle and hard-to-debug problems<br>
when the Lisp program calling them involves changing the current<br>
directory of the Emacs process.<br>
<br>
The other mandatory thing absent from the above is ENCODE_FILE.=C2=A0 You<b=
r>
cannot pass unencoded file names to C runtime functions.<br></blockquote><d=
iv><br></div><div>OK, will send a patch.</div><div>=C2=A0</div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">
<br>
=C2=A0 struct {<br>
=C2=A0 =C2=A0 struct emacs_runtime pub;<br>
=C2=A0 =C2=A0 struct emacs_runtime_private priv;<br>
=C2=A0 } runtime =3D {<br>
=C2=A0 =C2=A0 .pub =3D {<br>
=C2=A0 =C2=A0 =C2=A0 .size =3D sizeof runtime.pub,<br>
=C2=A0 =C2=A0 =C2=A0 .get_environment =3D module_get_environment,<br>
=C2=A0 =C2=A0 =C2=A0 .private_members =3D &amp;runtime.priv<br>
=C2=A0 =C2=A0 }<br>
=C2=A0 };<br>
<br>
Is this portable enough?<br></blockquote><div><br></div><div>According to=
=C2=A0<a href=3D"http://en.cppreference.com/w/c/language/struct_initializat=
ion">http://en.cppreference.com/w/c/language/struct_initialization</a>=C2=
=A0and=C2=A0<a href=3D"https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.=
html">https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html</a>=C2=A0des=
ignated initializers are part of C99, which is required for building Emacs.=
</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
=C2=A0 int r =3D module_init (&amp;runtime.pub);<br>
<br>
I think calling this function &quot;module_init&quot; is sub-optimal: you h=
ave a<br>
void function by the same name in this source file.=C2=A0 How about<br>
renaming it to something else?<br></blockquote><div><br></div><div>Agree</d=
iv><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Also, it seems that once a module is loaded, it cannot be unloaded<br>
(i.e., no unload-feature equivalent is provided).=C2=A0 Is that on purpose?=
<br>
An Emacs process can live for a very long time, so keeping all of the<br>
modules open in it at all times is not necessarily TRT.=C2=A0 E.g., how do<=
br>
I update to a newer version of a module?<br></blockquote><div><br></div><di=
v>Unloading is important (<a href=3D"https://github.com/aaptel/emacs-dynami=
c-module/issues/36">https://github.com/aaptel/emacs-dynamic-module/issues/3=
6</a>), but for now we decided to delay it to a later version because we ex=
pect much discussion about the precise semantics.</div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex">
<br>
Shouldn&#39;t module-init call dynlib_close before it returns?=C2=A0 Otherw=
ise<br>
we are leaking descriptors here, no?<br></blockquote><div><br></div><div>My=
 impression from reading dlclose(3) is that modules shouldn&#39;t be unload=
ed while they are still used.</div><div>=C2=A0</div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">
<br>
In module-call:<br>
<br>
=C2=A0 const EMACS_INT len =3D XINT (Flength (arglist));<br>
=C2=A0 eassert (len &gt;=3D 0);<br>
=C2=A0 if (len &gt; MOST_POSITIVE_FIXNUM)<br>
=C2=A0 =C2=A0 xsignal0 (Qoverflow_error);<br>
<br>
How can the &#39;if&#39; clause ever be true?=C2=A0 XINT by definition cann=
ot<br>
produce anything but a valid EMACS_INT, can it?<br></blockquote><div><br></=
div><div>True. Not sure what I was thinking. (Could be replaced by an easse=
rt, to document the assumption.)</div><div>=C2=A0</div><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex">
<br>
=C2=A0 if (len &gt; INT_MAX || len &lt; envptr-&gt;min_arity || (envptr-&gt=
;max_arity &gt;=3D 0 &amp;&amp; len &gt; envptr-&gt;max_arity))<br>
=C2=A0 =C2=A0 xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (=
envptr), make_number (len));<br>
<br>
Why the test against INT_MAX?=C2=A0 EMACS_INT can legitimately be a 64-bit<=
br>
data type with values far exceeding INT_MAX.=C2=A0 Why impose this<br>
limitation here?<br></blockquote><div><br></div><div>Because the nargs argu=
ment in the module interface is an int.</div><div>If you think functions wi=
th more than INT_MAX arguments should be supported, the type for nargs shou=
ld be changed to int64.</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
">
<br>
allocate_emacs_value calls malloc; shouldn&#39;t it call xmalloc instead,<b=
r>
or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol?<br></blockqu=
ote><div><br></div><div>If xmalloc is called, then we need to make sure tha=
t no signals (longjmps) can escape to module code. If appropriate setjmps a=
re in place that should be doable, but I need to check whether there are ed=
ge cases.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In module_format_fun_env, you produce a unibyte string, and then use<br>
that in calls to functions like xsignal1, which expect Lisp strings in<br>
their internal multibyte representation.=C2=A0 You should instead decode<br=
>
the unibyte string (using UTF-8) before you return it.<br></blockquote><div=
><br></div><div>OK, will send a patch.</div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex">
<br>
Btw, I wonder whether we should provide a more friendly capabilities<br>
for retrieving the function name and its module.=C2=A0 dladdr is not<br>
portable enough, and we have all the information at our fingertips<br>
in the module&#39;s init function, we just need to keep it instead of<br>
throwing it away.=C2=A0 I envision debugging module-related problems will<b=
r>
not be a very rare situation, so we need any help we can get.=C2=A0 WDYT?<b=
r></blockquote><div><br></div><div>Hmm, I don&#39;t know whether we have ac=
cess to the function name without using dladdr. The user just passes a poin=
ter to the module, not its name.</div><div>=C2=A0</div><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex">
<br>
In syms_of_module:<br>
<br>
=C2=A0 /* Unintern `module-environments&#39; because it is only used<br>
=C2=A0 =C2=A0 =C2=A0internally. */<br>
=C2=A0 Funintern (Qmodule_environments, Qnil);<br>
<br>
What if some Lisp defines a (interned) symbol by that name?=C2=A0 Won&#39;t=
<br>
they clash?<br></blockquote><div><br></div><div>I followed the lead of inte=
rnal-interpreter-environment in eval.c, which uses the same pattern.</div><=
div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex">
<br>
The Windows-specific parts of dynlib.c need work, e.g. you cannot pass<br>
UTF-8 encoded file names to Windows APIs.=C2=A0 And there are some other<br=
>
issues.=C2=A0 I&#39;ll take care of that.<br></blockquote><div><br></div><d=
iv>I can also do that, I should be able to set up a build environment in a =
VM.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
About the tests: The Makefile in mod-test is Unix-specific: it uses a<br>
literal .so extension.=C2=A0 I also think the Python script should be<br>
rewritten in Emacs Lisp, so that Python installation is not required.<br>
Finally, all of the module tests and associated files should be moved<br>
into test/, preferably even test/automated/ and made part of the &quot;make=
<br>
check&quot; run.<br>
<br></blockquote><div><br></div><div>Yes, tracked in <a href=3D"https://git=
hub.com/aaptel/emacs-dynamic-module/issues/34">https://github.com/aaptel/em=
acs-dynamic-module/issues/34</a>=C2=A0</div></div></div>

--001a114450c81561190524ec75f0--