unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#37309: ‘ssh-daemon’ service fails to start at boot
       [not found]               ` <20190828181141.GA27765@jurong>
@ 2019-09-05 13:18                 ` Giovanni Biscuolo
  2019-09-08  4:19                   ` 宋文武
                                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Giovanni Biscuolo @ 2019-09-05 13:18 UTC (permalink / raw)
  To: 37309

[-- Attachment #1: Type: text/plain, Size: 6151 bytes --]

Hi,

following a recent discussion on guix-sysadmin I have to confirm the
ssh-daemon issue since it is still happening on some of the machines I
administer

Previous possibly related bug reports are
https://issues.guix.gnu.org/issue/30993 and
https://issues.guix.gnu.org/issue/32197

Unfortunately this issue is *not* well reproducible, it depends on some
mysterious (to me) timing factor; AFAIU it does *not* depend on the
shepherd version, probably it depends on "something" related to IPv6
(read below the details)

Andreas Enge <andreas@enge.fr> writes:

[...]

> My impression is that the problem is still there. I am quite certain it
> happened when I rebooted dover, since I had to connect on the serial console
> to manually restart the ssh service.

I'm sure it happened when milano-guix-1 was rebooted due to data centre
maintenance and happened yesterday to one of my personal Guix machines at
office

[...]

My situation is similar to the one observed by Andreas

> Well, it is in /var/log/messages:
> Aug  3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
> Aug  3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.

--8<---------------cut here---------------start------------->8---
[...]
Sep  4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
[...]
Sep  4 21:46:03 localhost shepherd[1]: Service loopback has been started.
[...]
Sep  4 21:46:22 localhost vmunix: [    0.226337] PCI: Using configuration type 1 for base access
Sep  4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep  4 21:46:24 localhost shepherd[1]: Service networking has been started.
[...]
Sep  4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
[...]
Sep  4 21:46:30 localhost vmunix: [    0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
Sep  4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep  4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
[...]
Sep  4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
[...]
Sep  4 21:46:47 localhost vmunix: [    0.731142] Segment Routing with IPv6
--8<---------------cut here---------------end--------------->8---

Please note the timing of the dhclient and the sshd processes: I
inserted them as printed in /var/log/messages but they are not
time-sequential: does it means something or is irrelevant?

So the sshd process started (as far as I cen see there is no trace it
was stopped) and pretty soon shepherd noticed ssh-daemon was not
started.

Logging in from the console I see the ssh-daemon is stopped but enabled:

--8<---------------cut here---------------start------------->8---
Status of ssh-daemon:
  It is stopped.
  It is enabled.
  Provides (ssh-daemon).
  Requires (syslogd loopback).
  Conflicts with ().
  Will be respawned.
--8<---------------cut here---------------end--------------->8---

[...]

If I start it via `sudo herd start ssh-daemon` it immediatly starts,
like in Andreas experience:

> Aug  3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
> Aug  3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
> Aug  3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.

--8<---------------cut here---------------start------------->8---
Sep  5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
Sep  5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
Sep  5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Please notice the difference from above: this time the sshd server is
also listening on the IPv6 address :: while in the above log it was only
listening on the 0.0.0.0 IPv4 address

Does the failure have something to do with IPv6 not available when sshd
starts for the first time after a reboot?

Please have a look at the following /var/log/message excerpt from my
system after a succesfull ssh-daemon start soon after a reboot (no
"manual" intervention):

--8<---------------cut here---------------start------------->8---
Sep  5 14:45:00 localhost vmunix: [    0.247544] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]
Sep  5 14:44:45 localhost sshd[574]: Server listening on 0.0.0.0 port 22.
[...]
Sep  5 14:44:47 localhost sshd[574]: Server listening on :: port 22.
[...]
Sep  5 14:45:05 localhost shepherd[1]: Service ssh-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Bingo? This time ssh was started also on :: and it works right after a reboot.

It really seems it has something to do with IPv6 but I cannot understand
exactly what :-S (do I have to disable IPv6 in my configs?)

For completeness, I have to say that the issue happened yesterday after
a `guix system reconfigure`, this is my current system generation:

--8<---------------cut here---------------start------------->8---
Generation 8    Sep 04 2019 17:19:08    (current)
  file name: /var/guix/profiles/system-8-link
  canonical file name: /gnu/store/iw2ayn696f8ipmd5gzw9fxljf9h8w4pr-system
  label: GNU with Linux-Libre 5.2.11
  bootloader: grub-efi
  root device: UUID: 26bd54ec-4e74-4b3a-96ff-58f2f34e4a1a
  kernel: /gnu/store/xgl60ivx8p5p79zjbf08p4x09881wf4s-linux-libre-5.2.11/bzImage
--8<---------------cut here---------------end--------------->8---

Reconfigured with this guix version:

--8<---------------cut here---------------start------------->8---
g@batondor ~$ sudo -i guix describe 
Generation 6    Sep 04 2019 17:17:02    (current)
  guix 5ee1c04
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 5ee1c0459eebdd3b7771abaeab0f0b52ff86fdd5
--8<---------------cut here---------------end--------------->8---

This is the shepherd version:

--8<---------------cut here---------------start------------->8---
g@batondor ~$ shepherd --version
shepherd (GNU Shepherd) 0.6.1
--8<---------------cut here---------------end--------------->8---

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2019-09-05 13:18                 ` bug#37309: ‘ssh-daemon’ service fails to start at boot Giovanni Biscuolo
@ 2019-09-08  4:19                   ` 宋文武
  2019-11-26 18:34                     ` Jelle Licht
  2019-12-03 20:12                   ` bug#37309: [PATCH] services: openssh: Restrict to IPv4 Leo Famulari
  2020-11-27 23:00                   ` bug#37309: ‘ssh-daemon’ service fails to start at boot Christopher Lemmer Webber
  2 siblings, 1 reply; 13+ messages in thread
From: 宋文武 @ 2019-09-08  4:19 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: 37309

Giovanni Biscuolo <g@xelera.eu> writes:

> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

Hello, thank you for this report, it's reproducible with my box that has
an old hard disk, and disable IPv6 for sshd does fix the issue for me...

>
> Andreas Enge <andreas@enge.fr> writes:
>
> [...]
>
>> My impression is that the problem is still there. I am quite certain it
>> happened when I rebooted dover, since I had to connect on the serial console
>> to manually restart the ssh service.
>
> I'm sure it happened when milano-guix-1 was rebooted due to data centre
> maintenance and happened yesterday to one of my personal Guix machines at
> office
>
> [...]
>
> My situation is similar to the one observed by Andreas
>
>> Well, it is in /var/log/messages:
>> Aug  3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
>> Aug  3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.
>
> [...]
> Sep  4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
> [...]
> Sep  4 21:46:03 localhost shepherd[1]: Service loopback has been started.
> [...]
> Sep  4 21:46:22 localhost vmunix: [    0.226337] PCI: Using configuration type 1 for base access
> Sep  4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep  4 21:46:24 localhost shepherd[1]: Service networking has been started.
> [...]
> Sep  4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
> [...]
> Sep  4 21:46:30 localhost vmunix: [    0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
> Sep  4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep  4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
> [...]
> Sep  4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
> [...]
> Sep  4 21:46:47 localhost vmunix: [    0.731142] Segment Routing with IPv6
>
>
> Please note the timing of the dhclient and the sshd processes: I
> inserted them as printed in /var/log/messages but they are not
> time-sequential: does it means something or is irrelevant?
>
> So the sshd process started (as far as I cen see there is no trace it
> was stopped) and pretty soon shepherd noticed ssh-daemon was not
> started.
>
> Logging in from the console I see the ssh-daemon is stopped but enabled:
>
> Status of ssh-daemon:
>   It is stopped.
>   It is enabled.
>   Provides (ssh-daemon).
>   Requires (syslogd loopback).
>   Conflicts with ().
>   Will be respawned.
>
>
> [...]

Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
it until success or disable it, but by look at the code of
'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
does), and a timeout (default to 5s %pid-file-timeout) is reached, the
processes got a 'SIGTERM' and return '#f' as its running state, which
won't be respawn (it's not a pid number) I guess...

To ludo: Is my analysis correct?  It's not clear to me how to fix it so
'ssh-daemon' can be respawn though...

>
> If I start it via `sudo herd start ssh-daemon` it immediatly starts,
> like in Andreas experience:
>
>> Aug  3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
>> Aug  3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
>> Aug  3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.
>
> Sep  5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
> Sep  5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
> Sep  5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
>
>
> Please notice the difference from above: this time the sshd server is
> also listening on the IPv6 address :: while in the above log it was only
> listening on the 0.0.0.0 IPv4 address
>
> Does the failure have something to do with IPv6 not available when sshd
> starts for the first time after a reboot?

I agree, as adding '(extra-content "ListenAddress 0.0.0.0")' to my
'openssh-configuration' to skip the ipv6 listen fix this issue for me.

A proper fix should be respawn 'ssh-daemon' and start it after 'ipv6
available' (i don't know what this mean yet..).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2019-09-08  4:19                   ` 宋文武
@ 2019-11-26 18:34                     ` Jelle Licht
  2019-11-29  8:40                       ` Giovanni Biscuolo
  0 siblings, 1 reply; 13+ messages in thread
From: Jelle Licht @ 2019-11-26 18:34 UTC (permalink / raw)
  To: 宋文武, Giovanni Biscuolo; +Cc: 37309

Hey 宋文武, Giovanni,

iyzsong@member.fsf.org (宋文武) writes:

> [...]
> Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
> it until success or disable it, but by look at the code of
> 'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
> does), and a timeout (default to 5s %pid-file-timeout) is reached, the
> processes got a 'SIGTERM' and return '#f' as its running state, which
> won't be respawn (it's not a pid number) I guess...
>
> To ludo: Is my analysis correct?  It's not clear to me how to fix it so
> 'ssh-daemon' can be respawn though...

I think I am also running into a similar issue on my spinning rust based
T400. Is there a workaround available that does the above, or is that
analysis of the situation not correct either?

Thanks,

Jelle

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2019-11-26 18:34                     ` Jelle Licht
@ 2019-11-29  8:40                       ` Giovanni Biscuolo
  2019-11-29  9:51                         ` Jelle Licht
  0 siblings, 1 reply; 13+ messages in thread
From: Giovanni Biscuolo @ 2019-11-29  8:40 UTC (permalink / raw)
  To: Jelle Licht, 宋文武; +Cc: 37309

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Hi Jelle,

Jelle Licht <jlicht@fsfe.org> writes:

[...]

> I think I am also running into a similar issue on my spinning rust based
> T400. Is there a workaround available that does the above,

I added `(extra-content "ListenAddress 0.0.0.0")` to my
openssh-configuration, to only listen on IPv4 addresses:

--8<---------------cut here---------------start------------->8---
(service openssh-service-type
		  (openssh-configuration
		   (port-number 22)
		   (extra-content "ListenAddress 0.0.0.0")
		   (authorized-keys
		    `(("g" ,(local-file "keys/ssh/g.pub"))
		      ("hydra",(local-file "keys/ssh/hydra.pub"))))))
--8<---------------cut here---------------end--------------->8---

I tried to reboot several times one machine I can use for testing and it
works for me: please can you try and report if this also works for you?

[...]

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2019-11-29  8:40                       ` Giovanni Biscuolo
@ 2019-11-29  9:51                         ` Jelle Licht
  0 siblings, 0 replies; 13+ messages in thread
From: Jelle Licht @ 2019-11-29  9:51 UTC (permalink / raw)
  To: Giovanni Biscuolo, 宋文武; +Cc: 37309


Hi Giovanni,


Giovanni Biscuolo <g@xelera.eu> writes:

> Hi Jelle,
>
> Jelle Licht <jlicht@fsfe.org> writes:
>
> [...]
>
>> I think I am also running into a similar issue on my spinning rust based
>> T400. Is there a workaround available that does the above,
>
> I added `(extra-content "ListenAddress 0.0.0.0")` to my
> openssh-configuration, to only listen on IPv4 addresses:
>
> --8<---------------cut here---------------start------------->8---
> (service openssh-service-type
> 		  (openssh-configuration
> 		   (port-number 22)
> 		   (extra-content "ListenAddress 0.0.0.0")
> 		   (authorized-keys
> 		    `(("g" ,(local-file "keys/ssh/g.pub"))
> 		      ("hydra",(local-file "keys/ssh/hydra.pub"))))))
> --8<---------------cut here---------------end--------------->8---
>
> I tried to reboot several times one machine I can use for testing and it
> works for me: please can you try and report if this also works for you?

This, in combination with setting the pid-file-timeout to 30 seconds,
made everything work! I guess it is a combination of fun IPv6
interactions with extremely slow and busy spinning rust.

Thank you!

This does still like a workaround instead of a proper fix though; is
there something we can do to mitigate these issues in the first place?

- Jelle

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: [PATCH] services: openssh: Restrict to IPv4.
  2019-09-05 13:18                 ` bug#37309: ‘ssh-daemon’ service fails to start at boot Giovanni Biscuolo
  2019-09-08  4:19                   ` 宋文武
@ 2019-12-03 20:12                   ` Leo Famulari
  2019-12-03 21:53                     ` Julien Lepiller
  2020-11-27 23:00                   ` bug#37309: ‘ssh-daemon’ service fails to start at boot Christopher Lemmer Webber
  2 siblings, 1 reply; 13+ messages in thread
From: Leo Famulari @ 2019-12-03 20:12 UTC (permalink / raw)
  To: 37309

This works around <https://issues.guix.info/issue/30993>.

* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New field.
(openssh-config-file): Use it.
* doc/guix.texi: Document it.
---
 doc/guix.texi        | 10 ++++++++++
 gnu/services/ssh.scm | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/doc/guix.texi b/doc/guix.texi
index 39eb25385c..cf0e141baf 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level: @code{quiet}, @code{fatal},
 @code{error}, @code{info}, @code{verbose}, @code{debug}, etc.  See the man
 page for @file{sshd_config} for the full list of level names.
 
+@item @code{address-family} (default: @code{'inet})
+This is a symbol specifying which type of internet addresses should be
+handled by @command{sshd}.  The options are @code{inet} (IPv4),
+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
+@code{inet6}.  The upstream default in @code{any}.  However, we
+currently default to @code{inet} due to a nondeterministic
+@command{sshd} startup failure when using IPv6 on Guix.  See
+@uref{https://issues.guix.info/issue/30993, the bug report} for more
+information on this temporary limitation.
+
 @item @code{extra-content} (default: @code{""})
 This field can be used to append arbitrary text to the configuration file.  It
 is especially useful for elaborate configurations that cannot be expressed
diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index d2dbb8f80d..7e25810eff 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -4,6 +4,7 @@
 ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu>
 ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
 ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>
+;;; Copyright © 2019 Leo Famulari <leo@famulari.name>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -340,7 +341,16 @@ The other options should be self-descriptive."
   ;; proposed in <https://bugs.gnu.org/27155>.  Keep it internal/undocumented
   ;; for now.
   (%auto-start?          openssh-auto-start?
-                         (default #t)))
+                         (default #t))
+
+  ;; Symbol
+  ;; XXX: This shouldn't be required, but due to limitations with IPv6
+  ;; on Guix, sshd often fails to start when it attempts to bind to both
+  ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
+  ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
+  ;; <https://issues.guix.info/issue/30993>
+  (address-family        openssh-configuration-address-family
+                         (default 'inet)))
 
 (define %openssh-accounts
   (list (user-group (name "sshd") (system? #t))
@@ -468,6 +478,10 @@ of user-name/file-like tuples."
                       (symbol->string
                        (openssh-configuration-log-level config))))
 
+           (format port "AddressFamily ~a\n"
+                   #$(symbol->string
+                      (openssh-configuration-address-family config)))
+
            ;; Add '/etc/authorized_keys.d/%u', which we populate.
            (format port "AuthorizedKeysFile \
  .ssh/authorized_keys .ssh/authorized_keys2 /etc/ssh/authorized_keys.d/%u\n")
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#37309: [PATCH] services: openssh: Restrict to IPv4.
  2019-12-03 20:12                   ` bug#37309: [PATCH] services: openssh: Restrict to IPv4 Leo Famulari
@ 2019-12-03 21:53                     ` Julien Lepiller
  2019-12-04 13:41                       ` Leo Famulari
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Lepiller @ 2019-12-03 21:53 UTC (permalink / raw)
  To: 37309, leo

Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
>This works around <https://issues.guix.info/issue/30993>.
>
>* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New
>field.
>(openssh-config-file): Use it.
>* doc/guix.texi: Document it.
>---
> doc/guix.texi        | 10 ++++++++++
> gnu/services/ssh.scm | 16 +++++++++++++++-
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
>diff --git a/doc/guix.texi b/doc/guix.texi
>index 39eb25385c..cf0e141baf 100644
>--- a/doc/guix.texi
>+++ b/doc/guix.texi
>@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level:
>@code{quiet}, @code{fatal},
>@code{error}, @code{info}, @code{verbose}, @code{debug}, etc.  See the
>man
> page for @file{sshd_config} for the full list of level names.
> 
>+@item @code{address-family} (default: @code{'inet})
>+This is a symbol specifying which type of internet addresses should be
>+handled by @command{sshd}.  The options are @code{inet} (IPv4),
>+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>+@code{inet6}.  The upstream default in @code{any}.  However, we
default *is*
>+currently default to @code{inet} due to a nondeterministic
>+@command{sshd} startup failure when using IPv6 on Guix.  See
>+@uref{https://issues.guix.info/issue/30993, the bug report} for more
>+information on this temporary limitation.
>+
> @item @code{extra-content} (default: @code{""})
>This field can be used to append arbitrary text to the configuration
>file.  It
>is especially useful for elaborate configurations that cannot be
>expressed
>diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
>index d2dbb8f80d..7e25810eff 100644
>--- a/gnu/services/ssh.scm
>+++ b/gnu/services/ssh.scm
>@@ -4,6 +4,7 @@
> ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu>
> ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
> ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>
>+;;; Copyright © 2019 Leo Famulari <leo@famulari.name>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
>@@ -340,7 +341,16 @@ The other options should be self-descriptive."
>;; proposed in <https://bugs.gnu.org/27155>.  Keep it
>internal/undocumented
>   ;; for now.
>   (%auto-start?          openssh-auto-start?
>-                         (default #t)))
>+                         (default #t))
>+
>+  ;; Symbol
>+  ;; XXX: This shouldn't be required, but due to limitations with IPv6
>+  ;; on Guix, sshd often fails to start when it attempts to bind to
>both
>+  ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
>+  ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
>+  ;; <https://issues.guix.info/issue/30993>
>+  (address-family        openssh-configuration-address-family
>+                         (default 'inet)))
> 
> (define %openssh-accounts
>   (list (user-group (name "sshd") (system? #t))
>@@ -468,6 +478,10 @@ of user-name/file-like tuples."
>                       (symbol->string
>                        (openssh-configuration-log-level config))))
> 
>+           (format port "AddressFamily ~a\n"
>+                   #$(symbol->string
>+                      (openssh-configuration-address-family config)))
>+
>            ;; Add '/etc/authorized_keys.d/%u', which we populate.
>            (format port "AuthorizedKeysFile \
>.ssh/authorized_keys .ssh/authorized_keys2
>/etc/ssh/authorized_keys.d/%u\n")

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: [PATCH] services: openssh: Restrict to IPv4.
  2019-12-03 21:53                     ` Julien Lepiller
@ 2019-12-04 13:41                       ` Leo Famulari
  2019-12-10 16:47                         ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: Leo Famulari @ 2019-12-04 13:41 UTC (permalink / raw)
  To: Julien Lepiller; +Cc: 37309

On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
> >+@item @code{address-family} (default: @code{'inet})
> >+This is a symbol specifying which type of internet addresses should be
> >+handled by @command{sshd}.  The options are @code{inet} (IPv4),
> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
> >+@code{inet6}.  The upstream default in @code{any}.  However, we
> default *is*

Thanks!

This patch did make sshd work for me again.

However, as part of trying to debug this issue, I changed my system
configuration so that it uses dhcp-client-service and
wpa-supplicant-service instead of using Wicd. And now I can't reproduce
the bug anymore.

I guess that either 1) wpa_supplicant brings the network interfaces up
faster or 2) the state of the network interfaces is more accurately
captured with these services (in the sense of, is the network up?).

Tricky...

Does the patch help anybody else?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: [PATCH] services: openssh: Restrict to IPv4.
  2019-12-04 13:41                       ` Leo Famulari
@ 2019-12-10 16:47                         ` Ludovic Courtès
  0 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2019-12-10 16:47 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 37309

Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

> On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
>> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
>> >+@item @code{address-family} (default: @code{'inet})
>> >+This is a symbol specifying which type of internet addresses should be
>> >+handled by @command{sshd}.  The options are @code{inet} (IPv4),
>> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>> >+@code{inet6}.  The upstream default in @code{any}.  However, we
>> default *is*
>
> Thanks!
>
> This patch did make sshd work for me again.
>
> However, as part of trying to debug this issue, I changed my system
> configuration so that it uses dhcp-client-service and
> wpa-supplicant-service instead of using Wicd. And now I can't reproduce
> the bug anymore.
>
> I guess that either 1) wpa_supplicant brings the network interfaces up
> faster or 2) the state of the network interfaces is more accurately
> captured with these services (in the sense of, is the network up?).

Did anyone manage to get an strace log as was discussed in
<https://issues.guix.gnu.org/issue/30993>?

That would allow us to know where this is hanging exactly (probably
bind(2) on an IPv6 address.)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2019-09-05 13:18                 ` bug#37309: ‘ssh-daemon’ service fails to start at boot Giovanni Biscuolo
  2019-09-08  4:19                   ` 宋文武
  2019-12-03 20:12                   ` bug#37309: [PATCH] services: openssh: Restrict to IPv4 Leo Famulari
@ 2020-11-27 23:00                   ` Christopher Lemmer Webber
  2020-11-28  1:08                     ` Marius Bakke
  2 siblings, 1 reply; 13+ messages in thread
From: Christopher Lemmer Webber @ 2020-11-27 23:00 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: 37309

Giovanni Biscuolo writes:

> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

This issue continues to plauge me, and has ever since I started to use
GuixSD.  However it is much worse now that I am running Guix on
servers... I frequently have to log in via Linode's (nonfree!) web
console on every server that is rebooted and kick herd to restart
openssh.  Once I do that it's fine.

I don't think my linode machine is on "spinning rust" so I don't think
this is the cause.  IPv6, maybe?  Dunno what.

However I think that it's probably really a dependency issue somewhere;
herd is starting opensshd before some other dependent service is
spawned.  But what?  Maybe something authentication related like
networking, or something.  But hm, networking is required...

I'm assuming others must be experiencing this still too... right?

Would really like to see it fixed.  It's one of the few things holding
me back from recommending Guix on servers to others.

Do others have any idea?

I noticed the lsh daemon requires networking.  Why doesn't openssh?

What about the following "fix"?

diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index 1891db0487..c9bd62bab7 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -508,7 +508,7 @@ of user-name/file-like tuples."
 
   (list (shepherd-service
          (documentation "OpenSSH server.")
-         (requirement '(syslogd loopback))
+         (requirement '(syslogd networking loopback))
          (provision '(ssh-daemon ssh sshd))
          (start #~(make-forkexec-constructor #$openssh-command
                                              #:pid-file #$pid-file))




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2020-11-27 23:00                   ` bug#37309: ‘ssh-daemon’ service fails to start at boot Christopher Lemmer Webber
@ 2020-11-28  1:08                     ` Marius Bakke
  2020-12-03 20:38                       ` Leo Famulari
  0 siblings, 1 reply; 13+ messages in thread
From: Marius Bakke @ 2020-11-28  1:08 UTC (permalink / raw)
  To: Christopher Lemmer Webber, Giovanni Biscuolo; +Cc: 37309

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:

> Giovanni Biscuolo writes:
>
>> Hi,
>>
>> following a recent discussion on guix-sysadmin I have to confirm the
>> ssh-daemon issue since it is still happening on some of the machines I
>> administer
>>
>> Previous possibly related bug reports are
>> https://issues.guix.gnu.org/issue/30993 and
>> https://issues.guix.gnu.org/issue/32197
>>
>> Unfortunately this issue is *not* well reproducible, it depends on some
>> mysterious (to me) timing factor; AFAIU it does *not* depend on the
>> shepherd version, probably it depends on "something" related to IPv6
>> (read below the details)
>
> This issue continues to plauge me, and has ever since I started to use
> GuixSD.  However it is much worse now that I am running Guix on
> servers... I frequently have to log in via Linode's (nonfree!) web
> console on every server that is rebooted and kick herd to restart
> openssh.  Once I do that it's fine.

Can you share an excerpt of /var/log/messages (ideally the whole boot
sequence) from when SSH failed to start?

> I don't think my linode machine is on "spinning rust" so I don't think
> this is the cause.  IPv6, maybe?  Dunno what.
>
> However I think that it's probably really a dependency issue somewhere;
> herd is starting opensshd before some other dependent service is
> spawned.  But what?  Maybe something authentication related like
> networking, or something.  But hm, networking is required...
>
> I'm assuming others must be experiencing this still too... right?

FWIW I have never encountered this.  :-/

> Would really like to see it fixed.  It's one of the few things holding
> me back from recommending Guix on servers to others.
>
> Do others have any idea?
>
> I noticed the lsh daemon requires networking.  Why doesn't openssh?

It's really for legacy reasons, from before we had the Guix System
installer.  Then a common way to install was to run dhclient and
"herd start ssh-daemon" manually on the live image, so people could
do the installation over SSH:

  https://issues.guix.gnu.org/26548#5

Nowadays, the installer gives a nice and quick way to deploy a minimal
system, and I suspect the SSH method has fallen out of favor.

> What about the following "fix"?

[...]

>    (list (shepherd-service
>           (documentation "OpenSSH server.")
> -         (requirement '(syslogd loopback))
> +         (requirement '(syslogd networking loopback))

If it works for you, let's do this.  It would be good to find the
underlying cause though...

Not sure what to do about the installer however: perhaps create
yet-another undocumented field of openssh-service-type that makes the
networking requirement optional?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2020-11-28  1:08                     ` Marius Bakke
@ 2020-12-03 20:38                       ` Leo Famulari
  2020-12-03 21:56                         ` Christopher Lemmer Webber
  0 siblings, 1 reply; 13+ messages in thread
From: Leo Famulari @ 2020-12-03 20:38 UTC (permalink / raw)
  To: Marius Bakke; +Cc: 37309

[-- Attachment #1: Type: text/plain, Size: 448 bytes --]

On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:
> > I'm assuming others must be experiencing this still too... right?
> 
> FWIW I have never encountered this.  :-/

I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
are working for now. The problem has always been intermittent for me in
the past.

Chris, are you using an old Thinkpad too?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#37309: ‘ssh-daemon’ service fails to start at boot
  2020-12-03 20:38                       ` Leo Famulari
@ 2020-12-03 21:56                         ` Christopher Lemmer Webber
  0 siblings, 0 replies; 13+ messages in thread
From: Christopher Lemmer Webber @ 2020-12-03 21:56 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 37309

Leo Famulari writes:

> On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
>> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:
>> > I'm assuming others must be experiencing this still too... right?
>> 
>> FWIW I have never encountered this.  :-/
>
> I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
> are working for now. The problem has always been intermittent for me in
> the past.
>
> Chris, are you using an old Thinkpad too?

I did experience it on an old thinkpad, though in this case it's
happening on the Linode server I'm running.  Not particularly old, but
probably shared by many users and thus slower in some way.

That's part of what makes me think this is some kind of race
condition...




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-03 21:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87k1da6fdb.fsf@roquette.mug.biscuolo.net>
     [not found] ` <87y315t3hw.fsf@roquette.mug.biscuolo.net>
     [not found]   ` <87tvbhra2v.fsf@roquette.mug.biscuolo.net>
     [not found]     ` <87imrvhhpy.fsf@cbaines.net>
     [not found]       ` <874l3crjqr.fsf@roquette.mug.biscuolo.net>
     [not found]         ` <87k1c6p914.fsf@roquette.mug.biscuolo.net>
     [not found]           ` <20190817152031.GA3191@jurong>
     [not found]             ` <87pnkuyac0.fsf_-_@gnu.org>
     [not found]               ` <20190828181141.GA27765@jurong>
2019-09-05 13:18                 ` bug#37309: ‘ssh-daemon’ service fails to start at boot Giovanni Biscuolo
2019-09-08  4:19                   ` 宋文武
2019-11-26 18:34                     ` Jelle Licht
2019-11-29  8:40                       ` Giovanni Biscuolo
2019-11-29  9:51                         ` Jelle Licht
2019-12-03 20:12                   ` bug#37309: [PATCH] services: openssh: Restrict to IPv4 Leo Famulari
2019-12-03 21:53                     ` Julien Lepiller
2019-12-04 13:41                       ` Leo Famulari
2019-12-10 16:47                         ` Ludovic Courtès
2020-11-27 23:00                   ` bug#37309: ‘ssh-daemon’ service fails to start at boot Christopher Lemmer Webber
2020-11-28  1:08                     ` Marius Bakke
2020-12-03 20:38                       ` Leo Famulari
2020-12-03 21:56                         ` Christopher Lemmer Webber

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).