From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Date: Tue, 28 Aug 2018 11:47:12 +0200 Message-ID: <874lfen7q7.fsf@gnu.org> References: <20180329200803.GA15842@jasmine.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53549) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuabW-0003kf-L4 for bug-guix@gnu.org; Tue, 28 Aug 2018 05:48:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuabS-0003w1-QR for bug-guix@gnu.org; Tue, 28 Aug 2018 05:48:06 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:58052) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fuabS-0003vt-Kb for bug-guix@gnu.org; Tue, 28 Aug 2018 05:48:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fuabS-0000dR-GR for bug-guix@gnu.org; Tue, 28 Aug 2018 05:48:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <20180329200803.GA15842@jasmine.lan> (Leo Famulari's message of "Thu, 29 Mar 2018 16:08:03 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Leo Famulari Cc: 30993@debbugs.gnu.org Hi Leo, Leo Famulari skribis: > Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is > killed almost immediately after it starts with signal 15. I confirmed > the issue started with the Shepherd upgrade by bisecting our Git > history. > > I can reproduce the issue from commit > b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.). I=E2=80=99m =E2=80=9Chappy=E2=80=9D to say that I experienced this on a ser= ver=E2=80=94not having ssh access to a remote server is fairly annoying, I definitely sympathize=E2=80= =A6 What I see is this: --8<---------------cut here---------------start------------->8--- Aug 6 07:56:40 localhost shepherd[1]: Service loopback has been started.=20 [...] Aug 6 07:56:51 localhost sshd[606]: Server listening on 0.0.0.0 port 22. [...] Aug 6 07:57:05 localhost shepherd[1]: Service ssh-daemon could not be star= ted.=20 [...] Aug 6 07:57:46 localhost vmunix: [ 10.049791] random: ssh-keygen: uninit= ialized urandom read (32 bytes read) --8<---------------cut here---------------end--------------->8--- (Note that the last message was pulled from /dev/kmsg by syslogd, but it=E2=80=99s about an event that actually occurred before the first message= .) It waited for ~15 seconds, although =E2=80=98%pid-file-timeout=E2=80=99 in = (shepherd service) is only 5 seconds. The SIGTERM you were seeing very likely comes from this bit: --8<---------------cut here---------------start------------->8--- (match (read-pid-file pid-file #:max-delay pid-file-timeout) (#f (catch-system-error (kill pid SIGTERM)) #f) ((? integer? pid) pid)) --8<---------------cut here---------------end--------------->8--- On another machine: --8<---------------cut here---------------start------------->8--- Aug 28 09:10:49 localhost sshd[435]: Server listening on 0.0.0.0 port 22. Aug 28 09:10:49 localhost sshd[435]: Server listening on :: port 22. [...] Aug 28 09:10:50 localhost shepherd[1]: Service ssh-daemon has been started.= =20 --8<---------------cut here---------------end--------------->8--- I wonder if this has to do with IPv6 (the failing case lacks the =E2=80=9CS= erver listening on ::=E2=80=9D line), or if it=E2=80=99s just sshd occasionally t= aking a long time to start. Is it easily reproducible for you? Did you eventually gather more details? Thanks, Ludo=E2=80=99.