From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani
make_string is > a bug. So please use make_multibyte_string here instead. > See above; my suggestion would be to change the string handling code by limiting encoding and decoding to a small set of functions where the encoding would have to be specified explicitly. > > It looks like XUSER_PTR is used both as an lvalue and an rvalue. This > is different from any other object, where we have separate Xfoo and > XSETfoo macros. Suggest to follow suit. > Agreed. > > static void module_set_user_finalizer (emacs_env *env, > emacs_value uptr, > emacs_finalizer_function fin= ) > { > check_main_thread (); > eassert (module_non_local_exit_check (env) =3D=3D > emacs_funcall_exit_return); > const Lisp_Object lisp =3D value_to_lisp (uptr); > if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, lisp); > XUSER_PTR (lisp)->finalizer =3D fin; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<= < > } > > No validity checks on 'fin'? > How should it be validated? In C an arbitrary (invalid) pointer could be passed. I think we just have to accept that this is UB. > > In module_vec_get: > > /* Prevent error-prone comparison between types of different signedness= . > */ > const size_t size =3D ASIZE (lvec); > eassert (size >=3D 0); > > How can the assertion be ever violated? > Yeah, that's a bug. I probably meant size to be declared as ptrdiff_t, which is what ASIZE returns. > > In module-load: > > CHECK_STRING (file); > handle =3D dynlib_open (SDATA (file)); > > Every Lisp primitive that accepts file arguments _must_ call > expand-file-name on the file name, before using it. Otherwise, > relative file names will produce subtle and hard-to-debug problems > when the Lisp program calling them involves changing the current > directory of the Emacs process. > > The other mandatory thing absent from the above is ENCODE_FILE. You > cannot pass unencoded file names to C runtime functions. > OK, will send a patch. > > struct { > struct emacs_runtime pub; > struct emacs_runtime_private priv; > } runtime =3D { > .pub =3D { > .size =3D sizeof runtime.pub, > .get_environment =3D module_get_environment, > .private_members =3D &runtime.priv > } > }; > > Is this portable enough? > According to http://en.cppreference.com/w/c/language/struct_initialization and https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html designated initializers are part of C99, which is required for building Emacs. > > int r =3D module_init (&runtime.pub); > > I think calling this function "module_init" is sub-optimal: you have a > void function by the same name in this source file. How about > renaming it to something else? > Agree > > Also, it seems that once a module is loaded, it cannot be unloaded > (i.e., no unload-feature equivalent is provided). Is that on purpose? > An Emacs process can live for a very long time, so keeping all of the > modules open in it at all times is not necessarily TRT. E.g., how do > I update to a newer version of a module? > Unloading is important ( https://github.com/aaptel/emacs-dynamic-module/issues/36), but for now we decided to delay it to a later version because we expect much discussion about the precise semantics. > > Shouldn't module-init call dynlib_close before it returns? Otherwise > we are leaking descriptors here, no? > My impression from reading dlclose(3) is that modules shouldn't be unloaded while they are still used. > > In module-call: > > const EMACS_INT len =3D XINT (Flength (arglist)); > eassert (len >=3D 0); > if (len > MOST_POSITIVE_FIXNUM) > xsignal0 (Qoverflow_error); > > How can the 'if' clause ever be true? XINT by definition cannot > produce anything but a valid EMACS_INT, can it? > True. Not sure what I was thinking. (Could be replaced by an eassert, to document the assumption.) > > if (len > INT_MAX || len < envptr->min_arity || (envptr->max_arity >=3D= 0 > && len > envptr->max_arity)) > xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (envptr), > make_number (len)); > > Why the test against INT_MAX? EMACS_INT can legitimately be a 64-bit > data type with values far exceeding INT_MAX. Why impose this > limitation here? > Because the nargs argument in the module interface is an int. If you think functions with more than INT_MAX arguments should be supported, the type for nargs should be changed to int64. > > allocate_emacs_value calls malloc; shouldn't it call xmalloc instead, > or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol? > If xmalloc is called, then we need to make sure that no signals (longjmps) can escape to module code. If appropriate setjmps are in place that should be doable, but I need to check whether there are edge cases. > > In module_format_fun_env, you produce a unibyte string, and then use > that in calls to functions like xsignal1, which expect Lisp strings in > their internal multibyte representation. You should instead decode > the unibyte string (using UTF-8) before you return it. > OK, will send a patch. > > Btw, I wonder whether we should provide a more friendly capabilities > for retrieving the function name and its module. dladdr is not > portable enough, and we have all the information at our fingertips > in the module's init function, we just need to keep it instead of > throwing it away. I envision debugging module-related problems will > not be a very rare situation, so we need any help we can get. WDYT? > Hmm, I don't know whether we have access to the function name without using dladdr. The user just passes a pointer to the module, not its name. > > In syms_of_module: > > /* Unintern `module-environments' because it is only used > internally. */ > Funintern (Qmodule_environments, Qnil); > > What if some Lisp defines a (interned) symbol by that name? Won't > they clash? > I followed the lead of internal-interpreter-environment in eval.c, which uses the same pattern. > > The Windows-specific parts of dynlib.c need work, e.g. you cannot pass > UTF-8 encoded file names to Windows APIs. And there are some other > issues. I'll take care of that. > I can also do that, I should be able to set up a build environment in a VM. > > About the tests: The Makefile in mod-test is Unix-specific: it uses a > literal .so extension. I also think the Python script should be > rewritten in Emacs Lisp, so that Python installation is not required. > Finally, all of the module tests and associated files should be moved > into test/, preferably even test/automated/ and made part of the "make > check" run. > > Yes, tracked in https://github.com/aaptel/emacs-dynamic-module/issues/34 --001a114450c81561190524ec75f0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for making this possible.
Please allow me a few questions/comments about the dynamic modules
code, based on some initial reading:
Why module.c, but confusingly emacs_module.h?=C2=A0 Are there any reasons not to use the same base name, like we do for other C sources?
This comment in module.c:
=C2=A0 /* If checking is enabled, abort if the current thread is not the
=C2=A0 =C2=A0 =C2=A0Emacs main thread. */
=C2=A0 static void check_main_thread (void);
confused me, because a later comment says:
=C2=A0 void module_init (void)
=C2=A0 {
=C2=A0 =C2=A0 /* It is not guaranteed that dynamic initializers run in the = main thread,
=C2=A0 =C2=A0 =C2=A0 =C2=A0therefore we detect the main thread here. */
If dynamic initializers might run in a thread different from the Emacs
main thread, then the code in module_init will record that other
thread, not the Emacs main thread, right?
Also, another comment:
=C2=A0 /* On Windows, we store both a handle to the main thread and the
=C2=A0 =C2=A0 =C2=A0thread ID because the latter can be reused when a threa= d
=C2=A0 =C2=A0 =C2=A0terminates. */
seems to imply that 'main_thread' here is not the Emacs's main = thread,
because that thread never terminates as long as the Emacs session is
alive.
So what's the deal here? what does this thread checking supposed to
detect?
In this code from module.c:
=C2=A0 =C2=A0 =C2=A0 Lisp_Object value =3D HASH_VALUE (h, i);
=C2=A0 =C2=A0 =C2=A0 eassert (NATNUMP (value));
=C2=A0 =C2=A0 =C2=A0 const EMACS_UINT refcount =3D XFASTINT (value);
=C2=A0 =C2=A0 =C2=A0 if (refcount >=3D MOST_POSITIVE_FIXNUM)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 module_non_local_exit_signal_1 (env, Qov= erflow_error, Qnil);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return NULL;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
how can the 'if' clause ever be true?=C2=A0 refcount is an Emacs in= teger,
as you have just verified, no?=C2=A0 And if this somehow can happen, then why isn't there a similar check in the other functions?
Re this fragment from module.c:
=C2=A0 Lisp_Object ret =3D list4 (Qlambda,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0list2 (Qand_rest, Qargs),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0documentation ? build_string (documentation) : Qnil= ,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0list3 (module_call_func,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 envobj,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Qargs));
Thou shalt not use build_string, except when you _know_ the argument
will always be a pure-ASCII string.=C2=A0 Practically, this means the
argument must be a constant ASCII string.=C2=A0 See these messages (and the=
preceding discussion, if you are interested) for the gory details:
=C2=A0 http://lists.gnu.org/arc= hive/html/bug-gnu-emacs/2013-10/msg00955.html
=C2=A0 http://lists.gnu.org/arc= hive/html/bug-gnu-emacs/2013-10/msg00976.html
=C2=A0 http://lists.gnu.org/arc= hive/html/bug-gnu-emacs/2013-10/msg00979.html
The above should call make_multibyte_string instead.
It looks like XUSER_PTR is used both as an lvalue and an rvalue.=C2=A0 This=
is different from any other object, where we have separate Xfoo and
XSETfoo macros.=C2=A0 Suggest to follow suit.
=C2=A0 static void module_set_user_finalizer (emacs_env *env,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0emacs_value uptr,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0emacs_finalizer_function fin)
=C2=A0 {
=C2=A0 =C2=A0 check_main_thread ();
=C2=A0 =C2=A0 eassert (module_non_local_exit_check (env) =3D=3D emacs_funca= ll_exit_return);
=C2=A0 =C2=A0 const Lisp_Object lisp =3D value_to_lisp (uptr);
=C2=A0 =C2=A0 if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, li= sp);
=C2=A0 =C2=A0 XUSER_PTR (lisp)->finalizer =3D fin; <<<<<&= lt;<<<<<<<<<<<<<<<<<<= <<<<<<<<<
=C2=A0 }
No validity checks on 'fin'?
In module-load:
=C2=A0 CHECK_STRING (file);
=C2=A0 handle =3D dynlib_open (SDATA (file));
Every Lisp primitive that accepts file arguments _must_ call
expand-file-name on the file name, before using it.=C2=A0 Otherwise,
relative file names will produce subtle and hard-to-debug problems
when the Lisp program calling them involves changing the current
directory of the Emacs process.
The other mandatory thing absent from the above is ENCODE_FILE.=C2=A0 You cannot pass unencoded file names to C runtime functions.
=C2=A0 struct {
=C2=A0 =C2=A0 struct emacs_runtime pub;
=C2=A0 =C2=A0 struct emacs_runtime_private priv;
=C2=A0 } runtime =3D {
=C2=A0 =C2=A0 .pub =3D {
=C2=A0 =C2=A0 =C2=A0 .size =3D sizeof runtime.pub,
=C2=A0 =C2=A0 =C2=A0 .get_environment =3D module_get_environment,
=C2=A0 =C2=A0 =C2=A0 .private_members =3D &runtime.priv
=C2=A0 =C2=A0 }
=C2=A0 };
Is this portable enough?
=C2=A0 int r =3D module_init (&runtime.pub);
I think calling this function "module_init" is sub-optimal: you h= ave a
void function by the same name in this source file.=C2=A0 How about
renaming it to something else?
Also, it seems that once a module is loaded, it cannot be unloaded
(i.e., no unload-feature equivalent is provided).=C2=A0 Is that on purpose?=
An Emacs process can live for a very long time, so keeping all of the
modules open in it at all times is not necessarily TRT.=C2=A0 E.g., how do<= br> I update to a newer version of a module?
In module-call:
=C2=A0 const EMACS_INT len =3D XINT (Flength (arglist));
=C2=A0 eassert (len >=3D 0);
=C2=A0 if (len > MOST_POSITIVE_FIXNUM)
=C2=A0 =C2=A0 xsignal0 (Qoverflow_error);
How can the 'if' clause ever be true?=C2=A0 XINT by definition cann= ot
produce anything but a valid EMACS_INT, can it?
=C2=A0 if (len > INT_MAX || len < envptr->min_arity || (envptr->= ;max_arity >=3D 0 && len > envptr->max_arity))
=C2=A0 =C2=A0 xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (= envptr), make_number (len));
Why the test against INT_MAX?=C2=A0 EMACS_INT can legitimately be a 64-bit<= br> data type with values far exceeding INT_MAX.=C2=A0 Why impose this
limitation here?
allocate_emacs_value calls malloc; shouldn't it call xmalloc instead, or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol?If xmalloc is called, then we need to make sure tha= t no signals (longjmps) can escape to module code. If appropriate setjmps a= re in place that should be doable, but I need to check whether there are ed= ge cases.=C2=A0
In module_format_fun_env, you produce a unibyte string, and then use
that in calls to functions like xsignal1, which expect Lisp strings in
their internal multibyte representation.=C2=A0 You should instead decode
the unibyte string (using UTF-8) before you return it.OK, will send a patch.=C2=A0
Btw, I wonder whether we should provide a more friendly capabilities
for retrieving the function name and its module.=C2=A0 dladdr is not
portable enough, and we have all the information at our fingertips
in the module's init function, we just need to keep it instead of
throwing it away.=C2=A0 I envision debugging module-related problems will not be a very rare situation, so we need any help we can get.=C2=A0 WDYT?Hmm, I don't know whether we have ac= cess to the function name without using dladdr. The user just passes a poin= ter to the module, not its name.=C2=A0
In syms_of_module:
=C2=A0 /* Unintern `module-environments' because it is only used
=C2=A0 =C2=A0 =C2=A0internally. */
=C2=A0 Funintern (Qmodule_environments, Qnil);
What if some Lisp defines a (interned) symbol by that name?=C2=A0 Won't=
they clash?I followed the lead of inte= rnal-interpreter-environment in eval.c, which uses the same pattern.<= div>=C2=A0
The Windows-specific parts of dynlib.c need work, e.g. you cannot pass
UTF-8 encoded file names to Windows APIs.=C2=A0 And there are some other
issues.=C2=A0 I'll take care of that.
About the tests: The Makefile in mod-test is Unix-specific: it uses a
literal .so extension.=C2=A0 I also think the Python script should be
rewritten in Emacs Lisp, so that Python installation is not required.
Finally, all of the module tests and associated files should be moved
into test/, preferably even test/automated/ and made part of the "make=
check" run.