From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#29335: 'guix publish' workers occasionally crash Date: Sun, 19 Nov 2017 23:48:47 +0100 Message-ID: <87d14dzxyo.fsf@gnu.org> References: <878tf55i6u.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:38599) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eGYOf-0005Ee-Pg for bug-guix@gnu.org; Sun, 19 Nov 2017 17:49:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eGYOc-0002PX-Mx for bug-guix@gnu.org; Sun, 19 Nov 2017 17:49:05 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:39339) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eGYOc-0002PS-Ja for bug-guix@gnu.org; Sun, 19 Nov 2017 17:49:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eGYOc-0004Pa-Cv for bug-guix@gnu.org; Sun, 19 Nov 2017 17:49:02 -0500 Sender: "Debbugs-submit" Resent-To: bug-guix@gnu.org Resent-Message-ID: In-Reply-To: <878tf55i6u.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 17 Nov 2017 11:10:49 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 29335-done@debbugs.gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > On berlin I=E2=80=99ve noticed that the =E2=80=98guix publish=E2=80=99 wo= rkers would > occasionally stop working: the main thread would keep replying to HTTP > requests, but the worker threads would no longer do anything, and would > leave behind them a bunch of .tmp files in /var/cache/guix/publish. > > I captured the output of =E2=80=98guix publish=E2=80=99 (guix-0.13.0-8.35= 7ab93) and the > only clue I have is this: > > GET /6kl9ydqmgklcqhxswg6v5isq5n1ih5gp.narinfo > In guix/workers.scm: > 74:9 2 (_) > 78:32 1 (_ srfi-34 #) > In unknown file: > 0 (make-stack #t) > ERROR: In procedure make-stack: > ERROR: Throw to key `srfi-34' with args `(#)'. > GET /fgiih42mg2sr82mbmzf56grvrf021im6.narinfo Good news, this is fixed in 85f4f7b79040d982c6a655c898b4cd00d868fa9c. This could be reproduced by running =E2=80=98guix publish=E2=80=99 with 10 = workers or more, and then triggering nar compression en masse with =E2=80=98guix weath= er=E2=80=99. EBADF was due to a race condition in zlib.scm when closing gzip output ports: ;; 'gzclose' closes the underlying file descriptor. 'close-port' ;; calls close(2) and gets EBADF, which we swallow. (gzclose gzfile) (ignore-EBADF (close-port port))) There was a window after the =E2=80=98gzclose=E2=80=99 call during which th= e file descriptor for GZFILE and PORT above could be reused for something else, and then =E2=80=98close-port=E2=80=99 would close it. Thanks, Ludo=E2=80=99.