From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Thompson, David" Subject: bug#21694: 'clone' syscall binding unreliable Date: Fri, 16 Oct 2015 19:12:53 -0400 Message-ID: References: <87zizio8dc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:37946) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZnEBO-0005NS-1W for bug-guix@gnu.org; Fri, 16 Oct 2015 19:13:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZnEBK-00013q-Ru for bug-guix@gnu.org; Fri, 16 Oct 2015 19:13:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:34768) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZnEBK-00012B-Ov for bug-guix@gnu.org; Fri, 16 Oct 2015 19:13:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1ZnEBK-0003NZ-6i for bug-guix@gnu.org; Fri, 16 Oct 2015 19:13:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87zizio8dc.fsf@gnu.org> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 21694@debbugs.gnu.org, David Thompson On Fri, Oct 16, 2015 at 4:39 PM, Ludovic Court=C3=A8s wrote: > I=E2=80=99m reporting the problem and (hopefully) the solution, but I thi= nk we=E2=80=99d > better double-check this. > > The problem: Running the test below in a loop sometimes gets a SIGSEGV > in the child process (on x86_64, libc 2.22.) > > --8<---------------cut here---------------start------------->8--- > (use-modules (guix build syscalls) (ice-9 match)) > > (match (clone (logior CLONE_NEWUSER > CLONE_CHILD_SETTID > CLONE_CHILD_CLEARTID > SIGCHLD)) > (0 > (throw 'x)) ;XXX: sometimes segfaul= ts > (pid > (match (waitpid pid) > ((_ . status) > (pk 'status status) > (exit (not (status:term-sig status))))))) > --8<---------------cut here---------------end--------------->8--- > > Looking at (guix build syscalls) though, I see an ABI mismatch between > our definition and the actual =E2=80=98syscall=E2=80=99 C function, and b= etween our > =E2=80=98clone=E2=80=99 definition and the actual C function. > > This leads to the attached patch, which also fixes the above problem for = me. > > Could you test this patch? The patch looks good. Thanks for catching this! > Now, there remains the question of CLONE_CHILD_SETTID and > CLONE_CHILD_CLEARTID. Since we=E2=80=99re passing NULL for =E2=80=98ctid= =E2=80=99, I expect > that these flags have no effect at all. I added those flags in commit ee78d02 because they solved a real issue I ran into. Adding those flags made 'clone' look like a 'primitive-fork' call when examined with strace. > Conversely, libc uses these flags to update the thread ID in the child > process (x86_64/arch-fork.h): > > --8<---------------cut here---------------start------------->8--- > #define ARCH_FORK() \ > INLINE_SYSCALL (clone, 4, = \ > CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0,= \ > NULL, &THREAD_SELF->tid) > --8<---------------cut here---------------end--------------->8--- > > This is certainly useful, but we=E2=80=99d have troubles doing it from th= e FFI=E2=80=A6 > It may that this is fine if the process doesn=E2=80=99t use threads. Right, so here's what 'primitive-fork' does: clone(child_stack=3D0, flags=3DCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=3D0x7fc5398cea10) =3D 13247 Here's what 'clone' does: clone(child_stack=3D0, flags=3DCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=3D0) =3D 14038 In practice it may not be a problem since most of the time you'd 'exec' after cloning. Is there any reliable way to get a hold of whatever THREAD_SELF is? I wish the libc 'clone' function didn't have that silly callback and behaved like 'fork', then we could have avoided these issues altogether. - Dave