From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Fri, 20 Nov 2015 19:58:25 +0000 Message-ID: References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0102def4f414030524fe4c0b X-Trace: ger.gmane.org 1448049556 9513 80.91.229.3 (20 Nov 2015 19:59:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 20 Nov 2015 19:59:16 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 20 20:59:06 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Zzrpd-0000nG-63 for ged-emacs-devel@m.gmane.org; Fri, 20 Nov 2015 20:58:53 +0100 Original-Received: from localhost ([::1]:49619 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zzrpc-0006Wl-LT for ged-emacs-devel@m.gmane.org; Fri, 20 Nov 2015 14:58:52 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56372) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZzrpU-0006TO-Aw for emacs-devel@gnu.org; Fri, 20 Nov 2015 14:58:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZzrpQ-00077F-FK for emacs-devel@gnu.org; Fri, 20 Nov 2015 14:58:44 -0500 Original-Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]:33626) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZzrpL-000722-Jw; Fri, 20 Nov 2015 14:58:35 -0500 Original-Received: by wmec201 with SMTP id c201so86950540wme.0; Fri, 20 Nov 2015 11:58:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=C9rZaEgsLlM8FDgEJiss8xh71PhmwbXbK8lpQrgSY0U=; b=B0+OhlECV4bXi4Dt/UtCOo7dhvPA3WlEwqiCxsBg7F8Jrw4nV3o+1t47oUcby7t14m aWugnQSvl7NskV/sddrGnJspGLMU5a01K8RRjiQgL7Ag/B0Omt2m/IJ3IfW56tlnKV4Q W31R+IgXLfH9BE1u9OyzO3kW1y67v+ONfzkIkci+vmNuWXXYShu9H+TM9VvPOfAGUfkv byvJE7vy28D6IxQn2b2AkNI38Hxht4xxeW8dOXUzHdZVACtZcL+VNx/5EHPM3EdLJZ92 dUJGgTYMe/J0WUn2QSQdgRey8UV5xTaurEwiJhUUEA111a5j6OR/j1CPdFvBdpuOuY9Q scZg== X-Received: by 10.194.87.39 with SMTP id u7mr16602736wjz.11.1448049514908; Fri, 20 Nov 2015 11:58:34 -0800 (PST) In-Reply-To: <837fld6lps.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c09::233 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:194890 Archived-At: --089e0102def4f414030524fe4c0b Content-Type: text/plain; charset=UTF-8 Eli Zaretskii schrieb am Fr., 20. Nov. 2015 um 10:54 Uhr: > > From: Philipp Stephani > > Date: Thu, 19 Nov 2015 22:41:09 +0000 > > Cc: emacs-devel@gnu.org > > > > Thanks for the thorough review; I'll prepare patches for all of these > issues. > > Thank you. > > > This comment in module.c: > > > > /* If checking is enabled, abort if the current thread is not the > > Emacs main thread. */ > > static void check_main_thread (void); > > > > confused me, because a later comment says: > > > > void module_init (void) > > { > > /* It is not guaranteed that dynamic initializers run in the main > thread, > > therefore we detect the main thread here. */ > > > > If dynamic initializers might run in a thread different from the > Emacs > > main thread, then the code in module_init will record that other > > thread, not the Emacs main thread, right? > > > > No, because module_init is guaranteed to be called from the main thread > because > > main calls it explicitly. > > So you are saying that module_init runs in the main thread, but > dynamic initializers might run in another thread? How's that > possible? > According to http://stackoverflow.com/a/29776416 it is not guaranteed that dynamic initializers run in the main thread. > > > Also, another comment: > > > > /* On Windows, we store both a handle to the main thread and the > > thread ID because the latter can be reused when a thread > > terminates. */ > > > > seems to imply that 'main_thread' here is not the Emacs's main > thread, > > because that thread never terminates as long as the Emacs session is > > alive. > > > > So what's the deal here? what does this thread checking supposed to > > detect? > > > > This guards against the Emacs main thread having exited while module > code in > > some other thread is still running and attempting to call Emacs > functions. > > On what OS can this happen? It cannot happen on Windows, AFAIK, http://blogs.msdn.com/b/oldnewthing/archive/2010/08/27/10054832.aspx seems to indicate that it is possible that other threads continue running after the main thread has exited. > so > complicating the code by using a handle is not necessary on Windows, > because the thread ID of the main Emacs thread will never be reused as > long as the Emacs session is alive. Moreover, we already have that > thread's ID in a global variable called dwMainThreadId, computed > during the session startup (albeit after module_init, so it would need > to be moved to an earlier stage), so we could simply compare against > its value in check_main_thread. > My initial code didn't use a thread identifier at all because it could have been reused after the main thread ended, introducing a subtle bug. > > > This is undefined behavior, but we included an explicit check if > > checking is enabled because that case is somewhat subtle. > > I'm surprised this could happen on some OS. Can you point to any > details about this? > It's not ruled out by the C standard or the APIs of operating systems. If Emacs doesn't call ExitThread or manipulates the C runtime, then indeed it can't happen (at least on Windows), but I think it's brittle to rely on such subtleties without explicitly checking for them (somebody might introduce a call to ExitThread in the future without looking at module code). > > > In this code from module.c: > > > > Lisp_Object value = HASH_VALUE (h, i); > > eassert (NATNUMP (value)); > > const EMACS_UINT refcount = XFASTINT (value); > > if (refcount >= MOST_POSITIVE_FIXNUM) > > { > > module_non_local_exit_signal_1 (env, Qoverflow_error, Qnil); > > return NULL; > > } > > > > how can the 'if' clause ever be true? refcount is an Emacs integer, > > as you have just verified, no? And if this somehow can happen, then > > why isn't there a similar check in the other functions? > > > > refcount can be MOST_POSITIVE_FIXNUM because that's an inclusive bound. > It's > > important to check that case because later refcount is incremented by > one, and > > if it's equal to MOST_POSITIVE_FIXNUM it would be outside the allowed > range > > afterwards. No other function increments numbers, thus no other > functions need > > this check. > > OK, but then either there should have been a comment explaining this, > or, better, the test should have been after the addition. (Which, by > some strange luck -- or maybe something else -- is just what Paul > already did ;-) > If the test gets moved after the addition, then we should have a verify(MAX_EMACS_UINT - 1 > MOST_POSITIVE_FIXNUM) or use __builtin_add_overflow to make sure the addition doesn't cause UB. > > > Re this fragment from module.c: > > > > Lisp_Object ret = list4 (Qlambda, > > list2 (Qand_rest, Qargs), > > documentation ? build_string (documentation) : Qnil, > > list3 (module_call_func, > > envobj, > > Qargs)); > > > > Thou shalt not use build_string, except when you _know_ the argument > > will always be a pure-ASCII string. Practically, this means the > > argument must be a constant ASCII string. See these messages (and the > > preceding discussion, if you are interested) for the gory details: > > > > > http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00955.html > > > http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00976.html > > > http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00979.html > > > > The above should call make_multibyte_string instead. > > > > > > We had a discussion about encodings in > > https://github.com/aaptel/emacs-dynamic-module/issues/37. Sorry that > this > > didn't get resolved earlier; it's an important point. My suggestion > would be to > > always mandate specifying an encoding whenever a char* is passed, and > limit > > that to two or three functions dealing with creating strings and > accessing > > string contents. Would that address your concerns? > > No, this is not about encoding at all. This is about calling > build_string: you should only call it when the argument is a fixed > ASCII string. If the argument is not fixed or might include non-ASCII > characters, you should call make_multibyte_string instead. That's > because build_string might decide on its own whether to produce a > unibyte or a multibyte string, out of your control, whereas we always > want a multibyte string in this context. > > build_string doesn't encode or decode its argument, it creates a Lisp > object whose text is taken from the argument. It's similar to > make_number in that respect. > I'll send a patch with my changes to explain my intention. > > > Again in module.c: > > > > /* > > * Emacs internal encoding is more-or-less UTF8, let's assume utf8 > > * encoded emacs string are the same byte size. > > */ > > > > if (!buffer || length == 0 || *length-1 < raw_size) > > { > > *length = raw_size + 1; > > return false; > > } > > > > I don't understand why you need this assumption. You are going to > > encode the string in a moment, so why not test 'length' against the > > size you actually obtain there? (The above test will misfire when the > > original string includes characters above the Unicode codespace, > which > > require 5 bytes internally, but their encoding maps them to Unicode > > codepoints which cannot take more than 4 bytes. So you might reject > > perfectly valid calls.) > > > > In module_make_string you have: > > > > /* Assume STR is utf8 encoded */ > > return lisp_to_value (env, make_string (str, length)); > > > > The discussion I pointed to above concluded that make_string > is > > a bug. So please use make_multibyte_string here instead. > > > > > > See above; my suggestion would be to change the string handling code by > > limiting encoding and decoding to a small set of functions where the > encoding > > would have to be specified explicitly. > > Once again, that's not the issue. AFAIU, you have already specified > (albeit implicitly) that all strings passed as arguments to and from > the module functions are UTF-8 encoded. And that's fine by me, that's > not what these comments are about. > > The first comment is about a minor issue in > module_copy_string_contents. Consider this fragment from that > function: > > ptrdiff_t raw_size = SBYTES (lisp_str); > > /* Emacs internal encoding is more-or-less UTF8, let's assume utf8 > encoded emacs string are the same byte size. */ > > if (!buffer || length == 0 || *length-1 < raw_size) > { > *length = raw_size + 1; > return false; > } > > Lisp_Object lisp_str_utf8 = ENCODE_UTF_8 (lisp_str); > eassert (raw_size == SBYTES (lisp_str_utf8)); > *length = raw_size + 1; > memcpy (buffer, SDATA (lisp_str_utf8), SBYTES (lisp_str_utf8)); > buffer[raw_size] = 0; > > The comment and the 'if' clause after it in effect tell that you want > to check whether 'buffer' has enough space to hold the encoded string, > but you don't know how many bytes will be needed for that. So you > make some assumption (which, as I pointed out, could be wrong in rare > cases), and reject the strings that fail your test. What I'm sating > is that after the call to ENCODE_UTF_8, you can compute the exact > length in bytes of the UTF-8 encoded string, so if you move the test > there, you won't have to make any assumptions and you won't err. > > The second comment is again about calling make_string (which > build_string calls internally): use make_multibyte_string instead. > > > static void module_set_user_finalizer (emacs_env *env, > > emacs_value uptr, > > emacs_finalizer_function fin) > > { > > check_main_thread (); > > eassert (module_non_local_exit_check (env) == > emacs_funcall_exit_return); > > const Lisp_Object lisp = value_to_lisp (uptr); > > if (! USER_PTRP (lisp)) module_wrong_type (env, Quser_ptr, lisp); > > XUSER_PTR (lisp)->finalizer = fin; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > } > > > > No validity checks on 'fin'? > > > > How should it be validated? In C an arbitrary (invalid) pointer could be > > passed. I think we just have to accept that this is UB. > > Given the extensive checks elsewhere in the file, this surprised me, > that's all. How to test it? calling valid_pointer_p would be one > thing I'd suggest. Maybe there are other (perhaps platform-dependent) > checks possible here, I would have to look around. For example, it > would be good to make sure this points into executable code. > You can't validate pointers in C. For example, is the following pointer valid? long a; double *p = (double *)(&a); The answer is no; this is an aliasing violation, and dereferencing p is UB, but you won't detect this by trying to read the process's address space. See also http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx. The best way to detect such issues is running under asan. > > > if (len > INT_MAX || len < envptr->min_arity || (envptr->max_arity > >= 0 && > > len > envptr->max_arity)) > > xsignal2 (Qwrong_number_of_arguments, module_format_fun_env (envptr), > > make_number (len)); > > > > Why the test against INT_MAX? EMACS_INT can legitimately be a 64-bit > > data type with values far exceeding INT_MAX. Why impose this > > limitation here? > > > > Because the nargs argument in the module interface is an int. > > Shouldn't it be EMACS_INT instead? > EMACS_INT is an internal type that isn't exposed to module authors. It should be either int or int64. > > > If you think functions with more than INT_MAX arguments should be > supported, > > the type for nargs should be changed to int64. > > I thought your design decision was already to use int64 for any > integer argument, no? > Daniel's initial design ( https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html) had int, but I guess he wouldn't mind changing this to int64. > > Anyway, something that comes from Lisp should allow EMACS_INT, IMO. > It can't use EMACS_INT because that's an internal type defined in lisp.h, and emacs-module.h can't include lisp.h. Furthermore is should be a type with a static size to allow for ABI stability. Mismatches should be dealt with by explicit bounds checks. > > > allocate_emacs_value calls malloc; shouldn't it call xmalloc instead, > > or at least conform to the XMALLOC_BLOCK_INPUT_CHECK protocol? > > > > If xmalloc is called, then we need to make sure that no signals > (longjmps) can > > escape to module code. If appropriate setjmps are in place that should be > > doable, but I need to check whether there are edge cases. > > Paul already addressed that: xmalloc in Emacs does double or even > triple duty, and it would be a shame to lose that here. So I think we > should try to find a way to modify xmalloc such that it satisfies the > module code as well. > Could xmalloc grow a variant that is guaranteed not to call longjmp? > > > Btw, I wonder whether we should provide a more friendly capabilities > > for retrieving the function name and its module. dladdr is not > > portable enough, and we have all the information at our fingertips > > in the module's init function, we just need to keep it instead of > > throwing it away. I envision debugging module-related problems will > > not be a very rare situation, so we need any help we can get. WDYT? > > > > Hmm, I don't know whether we have access to the function name without > using > > dladdr. > > I thought about recording their names when you bind them to Lisp > symbols. > That would be nice, however it would require set_symbol_function, Fdefalias, etc. to explicitly check for module functions in their RHS. Not sure whether that's worth it. > > > About the tests: The Makefile in mod-test is Unix-specific: it uses a > > literal .so extension. I also think the Python script should be > > rewritten in Emacs Lisp, so that Python installation is not required. > > Finally, all of the module tests and associated files should be moved > > into test/, preferably even test/automated/ and made part of the > "make > > check" run. > > > > Yes, tracked in https://github.com/aaptel/emacs-dynamic-module/issues/34 > > You will see that I put there some preliminary stuff, so at least > these tests can be run manually on Windows, by overriding a single > Make variable from the command line. > > Thanks. > --089e0102def4f414030524fe4c0b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Fr., 20. Nov. 2015 um 10:54=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Thu, 19 Nov 2015 22:41:09 +0000
> Cc: emacs-dev= el@gnu.org
>
> Thanks for the thorough review; I'll prepare patches for all of th= ese issues.

Thank you.

>=C2=A0 =C2=A0 =C2=A0This comment in module.c:
>
>=C2=A0 =C2=A0 =C2=A0/* If checking is enabled, abort if the current thr= ead is not the
>=C2=A0 =C2=A0 =C2=A0Emacs main thread. */
>=C2=A0 =C2=A0 =C2=A0static void check_main_thread (void);
>
>=C2=A0 =C2=A0 =C2=A0confused me, because a later comment says:
>
>=C2=A0 =C2=A0 =C2=A0void module_init (void)
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0/* It is not guaranteed that dynamic initializers r= un in the main thread,
>=C2=A0 =C2=A0 =C2=A0therefore we detect the main thread here. */
>
>=C2=A0 =C2=A0 =C2=A0If dynamic initializers might run in a thread diffe= rent from the Emacs
>=C2=A0 =C2=A0 =C2=A0main thread, then the code in module_init will reco= rd that other
>=C2=A0 =C2=A0 =C2=A0thread, not the Emacs main thread, right?
>
> No, because module_init is guaranteed to be called from the main threa= d because
> main calls it explicitly.

So you are saying that module_init runs in the main thread, but
dynamic initializers might run in another thread?=C2=A0 How's that
possible?


>=C2=A0 =C2=A0 =C2=A0Also, another comment:
>
>=C2=A0 =C2=A0 =C2=A0/* On Windows, we store both a handle to the main t= hread and the
>=C2=A0 =C2=A0 =C2=A0thread ID because the latter can be reused when a t= hread
>=C2=A0 =C2=A0 =C2=A0terminates. */
>
>=C2=A0 =C2=A0 =C2=A0seems to imply that 'main_thread' here is n= ot the Emacs's main thread,
>=C2=A0 =C2=A0 =C2=A0because that thread never terminates as long as the= Emacs session is
>=C2=A0 =C2=A0 =C2=A0alive.
>
>=C2=A0 =C2=A0 =C2=A0So what's the deal here? what does this thread = checking supposed to
>=C2=A0 =C2=A0 =C2=A0detect?
>
> This guards against the Emacs main thread having exited while module c= ode in
> some other thread is still running and attempting to call Emacs functi= ons.

On what OS can this happen?=C2=A0 It cannot happen on Windows, AFAIK,

http://blogs.msdn.com/b/oldnewthing/archiv= e/2010/08/27/10054832.aspx=C2=A0seems to indicate that it is possible t= hat other threads continue running after the main thread has exited.
=C2=A0
so
complicating the code by using a handle is not necessary on Windows,
because the thread ID of the main Emacs thread will never be reused as
long as the Emacs session is alive.=C2=A0 Moreover, we already have that thread's ID in a global variable called dwMainThreadId, computed
during the session startup (albeit after module_init, so it would need
to be moved to an earlier stage), so we could simply compare against
its value in check_main_thread.

My init= ial code didn't use a thread identifier at all because it could have be= en reused after the main thread ended, introducing a subtle bug.
= =C2=A0

> This is undefined behavior, but we included an explicit check if
> checking is enabled because that case is somewhat subtle.

I'm surprised this could happen on some OS.=C2=A0 Can you point to any<= br> details about this?

It's not ruled = out by the C standard or the APIs of operating systems. If Emacs doesn'= t call ExitThread or manipulates the C runtime, then indeed it can't ha= ppen (at least on Windows), but I think it's brittle to rely on such su= btleties without explicitly checking for them (somebody might introduce a c= all to ExitThread in the future without looking at module code).
= =C2=A0

>=C2=A0 =C2=A0 =C2=A0In this code from module.c:
>
>=C2=A0 =C2=A0 =C2=A0Lisp_Object value =3D HASH_VALUE (h, i);
>=C2=A0 =C2=A0 =C2=A0eassert (NATNUMP (value));
>=C2=A0 =C2=A0 =C2=A0const EMACS_UINT refcount =3D XFASTINT (value);
>=C2=A0 =C2=A0 =C2=A0if (refcount >=3D MOST_POSITIVE_FIXNUM)
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0module_non_local_exit_signal_1 (env, Qoverflow_erro= r, Qnil);
>=C2=A0 =C2=A0 =C2=A0return NULL;
>=C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0how can the 'if' clause ever be true? refco= unt is an Emacs integer,
>=C2=A0 =C2=A0 =C2=A0as you have just verified, no? And if this somehow = can happen, then
>=C2=A0 =C2=A0 =C2=A0why isn't there a similar check in the other fu= nctions?
>
> refcount can be MOST_POSITIVE_FIXNUM because that's an inclusive b= ound. It's
> important to check that case because later refcount is incremented by = one, and
> if it's equal to MOST_POSITIVE_FIXNUM it would be outside the allo= wed range
> afterwards. No other function increments numbers, thus no other functi= ons need
> this check.

OK, but then either there should have been a comment explaining this,
or, better, the test should have been after the addition.=C2=A0 (Which, by<= br> some strange luck -- or maybe something else -- is just what Paul
already did ;-)

If the test gets moved = after the addition, then we should have a verify(MAX_EMACS_UINT - 1 > MO= ST_POSITIVE_FIXNUM) or use __builtin_add_overflow to make sure the addition= doesn't cause UB.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0Re this fragment from module.c:
>
>=C2=A0 =C2=A0 =C2=A0Lisp_Object ret =3D list4 (Qlambda,
>=C2=A0 =C2=A0 =C2=A0list2 (Qand_rest, Qargs),
>=C2=A0 =C2=A0 =C2=A0documentation ? build_string (documentation) : Qnil= ,
>=C2=A0 =C2=A0 =C2=A0list3 (module_call_func,
>=C2=A0 =C2=A0 =C2=A0envobj,
>=C2=A0 =C2=A0 =C2=A0Qargs));
>
>=C2=A0 =C2=A0 =C2=A0Thou shalt not use build_string, except when you _k= now_ the argument
>=C2=A0 =C2=A0 =C2=A0will always be a pure-ASCII string. Practically, th= is means the
>=C2=A0 =C2=A0 =C2=A0argument must be a constant ASCII string. See these= messages (and the
>=C2=A0 =C2=A0 =C2=A0preceding discussion, if you are interested) for th= e gory details:
>
>=C2=A0 =C2=A0 =C2=A0http://= lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00955.html
>=C2=A0 =C2=A0 =C2=A0http://= lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00976.html
>=C2=A0 =C2=A0 =C2=A0http://= lists.gnu.org/archive/html/bug-gnu-emacs/2013-10/msg00979.html
>
>=C2=A0 =C2=A0 =C2=A0The above should call make_multibyte_string instead= .
>
>
> We had a discussion about encodings in
> https://github.com/aaptel/emacs-dynamic= -module/issues/37. Sorry that this
> didn't get resolved earlier; it's an important point. My sugge= stion would be to
> always mandate specifying an encoding whenever a char* is passed, and = limit
> that to two or three functions dealing with creating strings and acces= sing
> string contents. Would that address your concerns?

No, this is not about encoding at all.=C2=A0 This is about calling
build_string: you should only call it when the argument is a fixed
ASCII string.=C2=A0 If the argument is not fixed or might include non-ASCII=
characters, you should call make_multibyte_string instead.=C2=A0 That's=
because build_string might decide on its own whether to produce a
unibyte or a multibyte string, out of your control, whereas we always
want a multibyte string in this context.

build_string doesn't encode or decode its argument, it creates a Lisp object whose text is taken from the argument.=C2=A0 It's similar to
make_number in that respect.

I'll s= end a patch with my changes to explain my intention.
=C2=A0
=

>=C2=A0 =C2=A0 =C2=A0Again in module.c:
>
>=C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0* Emacs internal encoding is more-or-less UTF8, let= 's assume utf8
>=C2=A0 =C2=A0 =C2=A0* encoded emacs string are the same byte size.
>=C2=A0 =C2=A0 =C2=A0*/
>
>=C2=A0 =C2=A0 =C2=A0if (!buffer || length =3D=3D 0 || *length-1 < ra= w_size)
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0*length =3D raw_size + 1;
>=C2=A0 =C2=A0 =C2=A0return false;
>=C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0I don't understand why you need this assumption= . You are going to
>=C2=A0 =C2=A0 =C2=A0encode the string in a moment, so why not test '= ;length' against the
>=C2=A0 =C2=A0 =C2=A0size you actually obtain there? (The above test wil= l misfire when the
>=C2=A0 =C2=A0 =C2=A0original string includes characters above the Unico= de codespace, which
>=C2=A0 =C2=A0 =C2=A0require 5 bytes internally, but their encoding maps= them to Unicode
>=C2=A0 =C2=A0 =C2=A0codepoints which cannot take more than 4 bytes. So = you might reject
>=C2=A0 =C2=A0 =C2=A0perfectly valid calls.)
>
>=C2=A0 =C2=A0 =C2=A0In module_make_string you have:
>
>=C2=A0 =C2=A0 =C2=A0/* Assume STR is utf8 encoded */
>=C2=A0 =C2=A0 =C2=A0return lisp_to_value (env, make_string (str, length= ));
>
>=C2=A0 =C2=A0 =C2=A0The discussion I pointed to above concluded that &l= t;quote>make_string is
>=C2=A0 =C2=A0 =C2=A0a bug</quote>. So please use make_multibyte_s= tring here instead.
>
>
> See above; my suggestion would be to change the string handling code b= y
> limiting encoding and decoding to a small set of functions where the e= ncoding
> would have to be specified explicitly.

Once again, that's not the issue.=C2=A0 AFAIU, you have already specifi= ed
(albeit implicitly) that all strings passed as arguments to and from
the module functions are UTF-8 encoded.=C2=A0 And that's fine by me, th= at's
not what these comments are about.

The first comment is about a minor issue in
module_copy_string_contents.=C2=A0 Consider this fragment from that
function:

=C2=A0 ptrdiff_t raw_size =3D SBYTES (lisp_str);

=C2=A0 /* Emacs internal encoding is more-or-less UTF8, let's assume ut= f8
=C2=A0 =C2=A0 =C2=A0encoded emacs string are the same byte size.=C2=A0 */
=C2=A0 if (!buffer || length =3D=3D 0 || *length-1 < raw_size)
=C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 *length =3D raw_size + 1;
=C2=A0 =C2=A0 =C2=A0 return false;
=C2=A0 =C2=A0 }

=C2=A0 Lisp_Object lisp_str_utf8 =3D ENCODE_UTF_8 (lisp_str);
=C2=A0 eassert (raw_size =3D=3D SBYTES (lisp_str_utf8));
=C2=A0 *length =3D raw_size + 1;
=C2=A0 memcpy (buffer, SDATA (lisp_str_utf8), SBYTES (lisp_str_utf8));
=C2=A0 buffer[raw_size] =3D 0;

The comment and the 'if' clause after it in effect tell that you wa= nt
to check whether 'buffer' has enough space to hold the encoded stri= ng,
but you don't know how many bytes will be needed for that.=C2=A0 So you=
make some assumption (which, as I pointed out, could be wrong in rare
cases), and reject the strings that fail your test.=C2=A0 What I'm sati= ng
is that after the call to ENCODE_UTF_8, you can compute the exact
length in bytes of the UTF-8 encoded string, so if you move the test
there, you won't have to make any assumptions and you won't err.
The second comment is again about calling make_string (which
build_string calls internally): use make_multibyte_string instead.

>=C2=A0 =C2=A0 =C2=A0static void module_set_user_finalizer (emacs_env *e= nv,
>=C2=A0 =C2=A0 =C2=A0emacs_value uptr,
>=C2=A0 =C2=A0 =C2=A0emacs_finalizer_function fin)
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0check_main_thread ();
>=C2=A0 =C2=A0 =C2=A0eassert (module_non_local_exit_check (env) =3D=3D e= macs_funcall_exit_return);
>=C2=A0 =C2=A0 =C2=A0const Lisp_Object lisp =3D value_to_lisp (uptr); >=C2=A0 =C2=A0 =C2=A0if (! USER_PTRP (lisp)) module_wrong_type (env, Qus= er_ptr, lisp);
>=C2=A0 =C2=A0 =C2=A0XUSER_PTR (lisp)->finalizer =3D fin; <<<= ;<<<<<<<<<<<<<<<<<<&l= t;<<<<<<<<<<<
>=C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0No validity checks on 'fin'?
>
> How should it be validated? In C an arbitrary (invalid) pointer could = be
> passed. I think we just have to accept that this is UB.

Given the extensive checks elsewhere in the file, this surprised me,
that's all.=C2=A0 How to test it? calling valid_pointer_p would be one<= br> thing I'd suggest.=C2=A0 Maybe there are other (perhaps platform-depend= ent)
checks possible here, I would have to look around.=C2=A0 For example, it would be good to make sure this points into executable code.

You can't validate pointers in C. For example, is= the following pointer valid?

long a;
do= uble *p =3D (double *)(&a);

The answer is no; = this is an aliasing violation, and dereferencing p is UB, but you won't= detect this by trying to read the process's address space. See also=C2= =A0http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx. The best way to detect such issues is running under asan.

>=C2=A0 =C2=A0 =C2=A0if (len > INT_MAX || len < envptr->min= _arity || (envptr->max_arity >=3D 0 &&
>=C2=A0 =C2=A0 =C2=A0len > envptr->max_arity))
>=C2=A0 =C2=A0 =C2=A0xsignal2 (Qwrong_number_of_arguments, module_format= _fun_env (envptr),
>=C2=A0 =C2=A0 =C2=A0make_number (len));
>
>=C2=A0 =C2=A0 =C2=A0Why the test against INT_MAX? EMACS_INT can legitim= ately be a 64-bit
>=C2=A0 =C2=A0 =C2=A0data type with values far exceeding INT_MAX. Why im= pose this
>=C2=A0 =C2=A0 =C2=A0limitation here?
>
> Because the nargs argument in the module interface is an int.

Shouldn't it be EMACS_INT instead?


> If you think functions with more than INT_MAX arguments should be supp= orted,
> the type for nargs should be changed to int64.

I thought your design decision was already to use int64 for any
integer argument, no?

=C2=A0

Anyway, something that comes from Lisp should allow EMACS_INT, IMO.

It can't use EMACS_INT because that's = an internal type defined in lisp.h, and emacs-module.h can't include li= sp.h. Furthermore is should be a type with a static size to allow for ABI s= tability. Mismatches should be dealt with by explicit bounds checks.
<= div>=C2=A0

>=C2=A0 =C2=A0 =C2=A0allocate_emacs_value calls malloc; shouldn't it= call xmalloc instead,
>=C2=A0 =C2=A0 =C2=A0or at least conform to the XMALLOC_BLOCK_INPUT_CHEC= K protocol?
>
> If xmalloc is called, then we need to make sure that no signals (longj= mps) can
> escape to module code. If appropriate setjmps are in place that should= be
> doable, but I need to check whether there are edge cases.

Paul already addressed that: xmalloc in Emacs does double or even
triple duty, and it would be a shame to lose that here.=C2=A0 So I think we=
should try to find a way to modify xmalloc such that it satisfies the
module code as well.

Could xmalloc grow= a variant that is guaranteed not to call longjmp?
=C2=A0

>=C2=A0 =C2=A0 =C2=A0Btw, I wonder whether we should provide a more frie= ndly capabilities
>=C2=A0 =C2=A0 =C2=A0for retrieving the function name and its module. dl= addr is not
>=C2=A0 =C2=A0 =C2=A0portable enough, and we have all the information at= our fingertips
>=C2=A0 =C2=A0 =C2=A0in the module's init function, we just need to = keep it instead of
>=C2=A0 =C2=A0 =C2=A0throwing it away. I envision debugging module-relat= ed problems will
>=C2=A0 =C2=A0 =C2=A0not be a very rare situation, so we need any help w= e can get. WDYT?
>
> Hmm, I don't know whether we have access to the function name with= out using
> dladdr.

I thought about recording their names when you bind them to Lisp
symbols.

That would be nice, however it= would require set_symbol_function, Fdefalias, etc. to explicitly check for= module functions in their RHS. Not sure whether that's worth it.
=
=C2=A0

>=C2=A0 =C2=A0 =C2=A0About the tests: The Makefile in mod-test is Unix-s= pecific: it uses a
>=C2=A0 =C2=A0 =C2=A0literal .so extension. I also think the Python scri= pt should be
>=C2=A0 =C2=A0 =C2=A0rewritten in Emacs Lisp, so that Python installatio= n is not required.
>=C2=A0 =C2=A0 =C2=A0Finally, all of the module tests and associated fil= es should be moved
>=C2=A0 =C2=A0 =C2=A0into test/, preferably even test/automated/ and mad= e part of the "make
>=C2=A0 =C2=A0 =C2=A0check" run.
>
> Yes, tracked in https://github.com/aapt= el/emacs-dynamic-module/issues/34

You will see that I put there some preliminary stuff, so at least
these tests can be run manually on Windows, by overriding a single
Make variable from the command line.

Thanks.
--089e0102def4f414030524fe4c0b--