From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sat, 21 Nov 2015 09:19:48 +0000 Message-ID: References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83610w5o97.fsf@gnu.org> <83wptb4ung.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b5d382cf0a0d90525097ebe X-Trace: ger.gmane.org 1448097626 25871 80.91.229.3 (21 Nov 2015 09:20:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2015 09:20:26 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 10:20:20 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a04LE-0008S3-3S for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 10:20:20 +0100 Original-Received: from localhost ([::1]:51596 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a04LD-0007p0-Eg for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 04:20:19 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53546) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a04Ky-0007mw-Lk for emacs-devel@gnu.org; Sat, 21 Nov 2015 04:20:06 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a04Kw-0007Q0-1a for emacs-devel@gnu.org; Sat, 21 Nov 2015 04:20:04 -0500 Original-Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:34262) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a04Kt-0007PM-2I; Sat, 21 Nov 2015 04:19:59 -0500 Original-Received: by wmvv187 with SMTP id v187so101387070wmv.1; Sat, 21 Nov 2015 01:19:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=T2oouxlkM7JYgnA7vJm+D5tQQwQO1MmiN4w12s58BA8=; b=g3/bny/TovZwKG9fhmrxhmYvAPqMoJ3e7VBUS/K4I/FnojEMkL5DeKcaHqtho2JHIJ FHnWyUNiunhXSogWhLSRAS8QM/wcciwSQi+wtPjgfLXc+KtpJbr/Tq7EWMFoTiz13/BA 5h1pqk4rPXTt8P4zgMxNxW4l7dEeZnuje8QPaorlztM9RClkQv5YGZF4R3q7CKuuv7kG 4PkrWsRgJ1CevX8zAkDr8QY9eqO1I+ynnc5RTv2lcc7z2L7MzLiisBsL/2KAnyWgwKe5 Yf1xGgT/joIC2ojax2/G2wZ56gBSDq5pP/NoawsiHfoiSnmwe11pCLW3NZWb/TZi3BPM Z0Tw== X-Received: by 10.194.19.163 with SMTP id g3mr18966965wje.166.1448097598183; Sat, 21 Nov 2015 01:19:58 -0800 (PST) In-Reply-To: <83wptb4ung.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c09::229 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:194930 Archived-At: --047d7b5d382cf0a0d90525097ebe Content-Type: text/plain; charset=UTF-8 Eli Zaretskii schrieb am Sa., 21. Nov. 2015 um 09:35 Uhr: > > From: Philipp Stephani > > Date: Fri, 20 Nov 2015 23:22:49 +0000 > > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, > emacs-devel@gnu.org > > > > Ah, you are talking about C++ dynamic initializers! So the model is > > that someone writes a module in C++, starts a thread there, and then > > inside some dynamic initializer calls emacs-module interface > > functions, is that it? If that's the situation, I'd suggest a > > prominent commentary describing this in the source. > > > > The SO post talks about C++ but the issue is the same in C. AFAIK with > C11 and > > C++11 the execution models are harmonized. > > It doesn't matter for the issue at hand whether it's C, C++, Java, or > whatever. My originally incorrect interpretation of what you wrote > was that you are talking about initializers that are part of the > module code, i.e. the emacs_module_init function they implement. > > > It seems the comment is overly confusing. It is supposed to warn about > the > > following. Naively, if you wanted to test whether you are in the main > thread, > > you would do (module types and naming): > > > > static thread_id main_thread = get_current_thread(); > > bool in_main_thread() { return get_current_thread() == main_thread; } > > > > The dynamic initializer here is the first "get_current_thread()"; it is > not > > guaranteed to run in the main thread, so "main_thread" is not guaranteed > to > > contain the main thread ID. Therefore you have to do: > > > > static thread_id main_thread; // initialized later > > int main() { > > // guaranteed to be in the main thread > > main_thread = get_current_thread(); > > } > > > > That's all. I'm not aware of any runtime that would run dynamic > initializers > > outside of the main thread, but it's not impossible and easy to protect > > against. > > AFAIK, code that does this: > > static thread_id main_thread = get_current_thread(); > > is invalid in C, because such initializers cannot call functions. So > it would surprise me to see code which tried to record its thread ID > before 'main'. > Indeed, you're right. (Yes, I'm a C++ programmer :)) Sorry for the confusion. > > So I think we should reword that comment to be much less mysterious > and confusing than it is now. (Look how much did we need to talk > about for you to explain to me what was the intent of the comment.) > Since the alternative isn't even legal, the comment can just go away. > > > Anyway, thanks for explaining this, I now know how to change the code > > to DTRT on MS-Windows wrt to the thread checks. > > > > This is unfortunately all surprisingly subtle and vaguely defined. See > e.g. > > http://stackoverflow.com/q/19744250/178761 (apparently the standards > are vague > > about what happens to detached threads after main has exited). > > I don't see how that affects the issue at hand. The issue at hand is > whether a thread ID of the main thread could be reused while some of > the other threads belonging to the Emacs process are still running. > And the answer to that on MS-Windows is AFAIU a sound NO, because as > long as the Emacs process is alive, it holds a handle on the main > thread, which precludes the OS from discarding that thread's kernel > object. Why? because a thread handle can and is used to query the OS > about that thread's conditions, like its exit code, or wait for its > completion in the likes of WaitForSingleObject. So the kernel object > that represents the thread must be kept by the OS as long as at least > one open handle for the thread exists, and that prevents the OS from > reusing the thread ID. > Does it actually hold that handle? It sounds reasonable, but I can't find it documented. > > > > See also > > > http://blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx > . > > > > We don't use IsBadWritePtr on Windows to check this, see > > w32_valid_pointer_p for how this is actually implemented. > > > > Much of this applies generally. > > > "But what should I do, then, if somebody passes me a bad pointer?" > > > You should crash. > > Which is what we do, since eassert aborts. We will just do it sooner, > which IME is a Good Thing. > OK. We should be a bit careful with the current implementation of valid_pointer_p though, as AFAIK write(2) could cast the pointer it receives to char* and dereference it, but that's probably no worse than not checking it at all, and unrelated to modules. > > > Anyway, I'm surprised by this extreme POV: even if we cannot validate > > a pointer 100%, surely it doesn't mean we cannot or shouldn't do some > > partial job? Why this "all or nothing" approach? > > > > We can check whether it's NULL. Apart from that, everything else is > outside of > > the C standard. > > Emacs is not a Standard C program, far from it. It uses a lot of > stuff outside of any C standard, and for a very good reason: it is a > large and complicate program with many features that require access to > OS facilities. > > IOW, using only Standard C features is not, and cannot be, a > requirement for Emacs code. > Fair enough. > > > We need to devise a way for it to detect that it was called from > > emacs-module.c, the rest is simple, I think. > > > > Hmm, why does it need to detect anything? Can't it just be a different > function > > that doesn't signal, similar to push_handler and push_handler_nosignal? > > I don't think we want each of its callers call the signaling part by > itself. That would be repeating the problem with malloc itself: many > programs simply neglect to include the code which does TRT when it > returns NULL. xmalloc solves this, and makes sure the (non-trivial) > error action and message are always the same in that case. > > We need a variant of this for memory allocated on behalf of modules, I > think. > But this would require modules to be prepared for handling longjmps, which in general they aren't. In an "unsafe" language like C, we can't do without the users' cooperation. If users ignore NULL returns from malloc, that's a bug and we can't do much about it. We could make it harder to accidentially ignore the result: bool emacs_malloc(size_t size, void **out) __attribute__((warn_unused_result)); but that requires a compiler extension and is not really consistent with the rest of the module interface, where NULL is regularly returned on failure. --047d7b5d382cf0a0d90525097ebe Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 21. Nov. 2015 um 09:35=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Fri, 20 Nov 2015 23:22:49 +0000
> Cc: tzz@lifelogs= .com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
>=C2=A0 =C2=A0 =C2=A0Ah, you are talking about C++ dynamic initializers!= So the model is
>=C2=A0 =C2=A0 =C2=A0that someone writes a module in C++, starts a threa= d there, and then
>=C2=A0 =C2=A0 =C2=A0inside some dynamic initializer calls emacs-module = interface
>=C2=A0 =C2=A0 =C2=A0functions, is that it? If that's the situation,= I'd suggest a
>=C2=A0 =C2=A0 =C2=A0prominent commentary describing this in the source.=
>
> The SO post talks about C++ but the issue is the same in C. AFAIK with= C11 and
> C++11 the execution models are harmonized.

It doesn't matter for the issue at hand whether it's C, C++, Java, = or
whatever.=C2=A0 My originally incorrect interpretation of what you wrote was that you are talking about initializers that are part of the
module code, i.e. the emacs_module_init function they implement.

> It seems the comment is overly confusing. It is supposed to warn about= the
> following. Naively, if you wanted to test whether you are in the main = thread,
> you would do (module types and naming):
>
> static thread_id main_thread =3D get_current_thread();
> bool in_main_thread() { return get_current_thread() =3D=3D main_thread= ; }
>
> The dynamic initializer here is the first "get_current_thread()&q= uot;; it is not
> guaranteed to run in the main thread, so "main_thread" is no= t guaranteed to
> contain the main thread ID.=C2=A0 Therefore you have to do:
>
> static thread_id main_thread; // initialized later
> int main() {
> // guaranteed to be in the main thread
> main_thread =3D get_current_thread();
> }
>
> That's all. I'm not aware of any runtime that would run dynami= c initializers
> outside of the main thread, but it's not impossible and easy to pr= otect
> against.

AFAIK, code that does this:

=C2=A0 static thread_id main_thread =3D get_current_thread();

is invalid in C, because such initializers cannot call functions.=C2=A0 So<= br> it would surprise me to see code which tried to record its thread ID
before 'main'.

Indeed, you'= re right. (Yes, I'm a C++ programmer :))
Sorry for the confus= ion.
=C2=A0

So I think we should reword that comment to be much less mysterious
and confusing than it is now.=C2=A0 (Look how much did we need to talk
about for you to explain to me what was the intent of the comment.)

Since the alternative isn't even legal, th= e comment can just go away.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0Anyway, thanks for explaining this, I now know how = to change the code
>=C2=A0 =C2=A0 =C2=A0to DTRT on MS-Windows wrt to the thread checks.
>
> This is unfortunately all surprisingly subtle and vaguely defined. See= e.g.
> http://stackoverflow.com/q/19744250/178761 (appar= ently the standards are vague
> about what happens to detached threads after main has exited).

I don't see how that affects the issue at hand.=C2=A0 The issue at hand= is
whether a thread ID of the main thread could be reused while some of
the other threads belonging to the Emacs process are still running.
And the answer to that on MS-Windows is AFAIU a sound NO, because as
long as the Emacs process is alive, it holds a handle on the main
thread, which precludes the OS from discarding that thread's kernel
object.=C2=A0 Why? because a thread handle can and is used to query the OS<= br> about that thread's conditions, like its exit code, or wait for its
completion in the likes of WaitForSingleObject.=C2=A0 So the kernel object<= br> that represents the thread must be kept by the OS as long as at least
one open handle for the thread exists, and that prevents the OS from
reusing the thread ID.

Does it actually= hold that handle? It sounds reasonable, but I can't find it documented= .
=C2=A0

>=C2=A0 =C2=A0 =C2=A0> See also
>=C2=A0 =C2=A0 =C2=A0> http:= //blogs.msdn.com/b/oldnewthing/archive/2006/09/27/773741.aspx.
>
>=C2=A0 =C2=A0 =C2=A0We don't use IsBadWritePtr on Windows to check = this, see
>=C2=A0 =C2=A0 =C2=A0w32_valid_pointer_p for how this is actually implem= ented.
>
> Much of this applies generally.
> > "But what should I do, then, if somebody passes me a bad poi= nter?"
> > You should crash.

Which is what we do, since eassert aborts.=C2=A0 We will just do it sooner,=
which IME is a Good Thing.

OK. We shoul= d be a bit careful with the current implementation of valid_pointer_p thoug= h, as AFAIK write(2) could cast the pointer it receives to char* and derefe= rence it, but that's probably no worse than not checking it at all, and= unrelated to modules.
=C2=A0

>=C2=A0 =C2=A0 =C2=A0Anyway, I'm surprised by this extreme POV: even= if we cannot validate
>=C2=A0 =C2=A0 =C2=A0a pointer 100%, surely it doesn't mean we canno= t or shouldn't do some
>=C2=A0 =C2=A0 =C2=A0partial job? Why this "all or nothing" ap= proach?
>
> We can check whether it's NULL. Apart from that, everything else i= s outside of
> the C standard.

Emacs is not a Standard C program, far from it.=C2=A0 It uses a lot of
stuff outside of any C standard, and for a very good reason: it is a
large and complicate program with many features that require access to
OS facilities.

IOW, using only Standard C features is not, and cannot be, a
requirement for Emacs code.

Fair enough= .
=C2=A0

>=C2=A0 =C2=A0 =C2=A0We need to devise a way for it to detect that it wa= s called from
>=C2=A0 =C2=A0 =C2=A0emacs-module.c, the rest is simple, I think.
>
> Hmm, why does it need to detect anything? Can't it just be a diffe= rent function
> that doesn't signal, similar to push_handler and push_handler_nosi= gnal?

I don't think we want each of its callers call the signaling part by itself.=C2=A0 That would be repeating the problem with malloc itself: many<= br> programs simply neglect to include the code which does TRT when it
returns NULL.=C2=A0 xmalloc solves this, and makes sure the (non-trivial) error action and message are always the same in that case.

We need a variant of this for memory allocated on behalf of modules, I
think.

But this would require modules t= o be prepared for handling longjmps, which in general they aren't.
In an "unsafe" language like C, we can't do without the= users' cooperation. If users ignore NULL returns from malloc, that'= ;s a bug and we can't do much about it. We could make it harder to acci= dentially ignore the result:
bool emacs_malloc(size_t size, void = **out) __attribute__((warn_unused_result));
but that requires a c= ompiler extension and is not really consistent with the rest of the module = interface, where NULL is regularly returned on failure.=C2=A0
--047d7b5d382cf0a0d90525097ebe--