unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
@ 2018-03-29 20:08 Leo Famulari
  2018-04-06  8:21 ` Ludovic Courtès
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Leo Famulari @ 2018-03-29 20:08 UTC (permalink / raw)
  To: 30993


[-- Attachment #1.1: Type: text/plain, Size: 2006 bytes --]

Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
killed almost immediately after it starts with signal 15. I confirmed
the issue started with the Shepherd upgrade by bisecting our Git
history.

I can reproduce the issue from commit
b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).

One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
created anymore. And if I create an empty PID file by hand, it is
removed after trying to start the ssh-daemon service. Also, the sshd
user's home '/var/run/sshd' does not exist, and is similarly removed if
it does exist.

I ran the OpenSSH system test `make check-system TESTS=openssh` and it
failed when it could not find the PID file. It passed on another
non-GuixSD machine. The failing machine is relatively slow and lacks
KVM: a ThinkPad x200s.

After boot, trying to start the service again with `herd start
ssh-daemon` gives the same result.

I modified the sshd invocation to print some debug output ('-d -E
/tmp/sshd.log') and this is what it shows:

------
debug1: sshd version OpenSSH_7.6, OpenSSL 1.0.2o  27 Mar 2018
debug1: private host key #0: ssh-rsa SHA256:REDACTED
debug1: private host key #1: ssh-dss SHA256:REDACTED
debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:REDACTED
debug1: private host key #3: ssh-ed25519 SHA256:REDACTED
debug1: rexec_argv[0]='/gnu/store/az7vib8gk16fybhshh5xpkljmgxyrs4k-openssh-7.6p1/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-d'
debug1: rexec_argv[3]='-E'
debug1: rexec_argv[4]='/tmp/sshd.log'
debug1: rexec_argv[5]='-f'
debug1: rexec_argv[6]='/gnu/store/miy7xg5j4fg3mn04mcl27awmcl6s97ss-sshd_config'
debug1: Set /proc/self/oom_score_adj from 0 to -1000
debug1: Bind to port 22 on 0.0.0.0.
Server listening on 0.0.0.0 port 22.
debug1: Bind to port 22 on ::.
Server listening on :: port 22.
Received signal 15; terminating.
------

My system configuration file, the shepherd log messages, and the OpenSSH
system test logs are attached. Any ideas?

[-- Attachment #1.2: shepherd.log --]
[-- Type: text/plain, Size: 3664 bytes --]

Mar 29 15:47:32 localhost shepherd[1]: Service syslogd has been started. 
Mar 29 15:47:33 localhost shepherd[1]: Service loopback has been started. 
Mar 29 15:47:34 localhost shepherd[1]: Service virtual-terminal has been started. 
Mar 29 15:47:35 localhost shepherd[1]: Service term-tty6 has been started. 
Mar 29 15:47:35 localhost shepherd[1]: Service term-tty5 has been started. 
Mar 29 15:47:36 localhost shepherd[1]: Service term-tty4 has been started. 
Mar 29 15:47:38 localhost shepherd[1]: Service term-tty3 has been started. 
Mar 29 15:47:39 localhost shepherd[1]: Service term-tty2 has been started. 
Mar 29 15:47:41 localhost shepherd[1]: Service term-tty1 has been started. 
Mar 29 15:47:43 localhost shepherd[1]: Service term-auto could not be started. 
Mar 29 15:47:44 localhost shepherd[1]: Service console-font-tty1 has been started. 
Mar 29 15:47:46 localhost shepherd[1]: Service console-font-tty2 has been started. 
Mar 29 15:47:48 localhost shepherd[1]: Service console-font-tty3 has been started. 
Mar 29 15:47:49 localhost shepherd[1]: Service console-font-tty4 has been started. 
Mar 29 15:47:50 localhost shepherd[1]: Service console-font-tty5 has been started. 
Mar 29 15:47:50 localhost shepherd[1]: Service console-font-tty6 has been started. 
Mar 29 15:47:51 localhost shepherd[1]: Service dbus-system has been started. 
Mar 29 15:47:51 localhost shepherd[1]: Service networking has been started. 
Mar 29 15:47:52 localhost shepherd[1]: Service ntpd has been started. 
Mar 29 15:47:56 localhost shepherd[1]: Service ssh-daemon could not be started. 
Mar 29 15:47:59 localhost shepherd[1]: Service gpm has been started. 
Mar 29 15:48:29 localhost vmunix: [    5.486795] shepherd[1]: Service root has been started.
Mar 29 15:48:29 localhost vmunix: [    7.379651] shepherd[1]: starting services...
Mar 29 15:48:29 localhost vmunix: [    7.381562] shepherd[1]: Service root-file-system has been started.
Mar 29 15:48:29 localhost vmunix: [    7.383337] shepherd[1]: Service user-file-systems has been started.
Mar 29 15:48:29 localhost vmunix: [    7.625406] shepherd[1]: Service file-system-/home has been started.
Mar 29 15:48:29 localhost vmunix: [    7.627645] shepherd[1]: Service file-system-/dev/pts has been started.
Mar 29 15:48:29 localhost vmunix: [    7.629909] shepherd[1]: Service file-system-/dev/shm has been started.
Mar 29 15:48:29 localhost vmunix: [    7.632089] shepherd[1]: Service file-system-/gnu/store has been started.
Mar 29 15:48:29 localhost vmunix: [    7.633924] shepherd[1]: Service file-systems has been started.
Mar 29 15:48:29 localhost vmunix: [    7.731477] shepherd[1]: waiting for udevd...
Mar 29 15:48:29 localhost vmunix: [    8.331344] shepherd[1]: Service udev has been started.
Mar 29 15:48:30 localhost vmunix: [    8.446599] shepherd[1]: Service urandom-seed has been started.
Mar 29 15:48:30 localhost vmunix: [    8.448462] shepherd[1]: Service user-processes has been started.
Mar 29 15:48:30 localhost vmunix: [    8.450424] shepherd[1]: Service host-name has been started.
Mar 29 15:48:30 localhost vmunix: [    8.546746] shepherd[1]: Service user-homes could not be started.
Mar 29 15:48:30 localhost vmunix: [    9.554051] shepherd[1]: Service nscd has been started.
Mar 29 15:48:30 localhost vmunix: [    9.606182] shepherd[1]: Service guix-daemon has been started.
Mar 29 15:49:21 localhost shepherd[1]: Respawning term-tty2. 
Mar 29 15:49:21 localhost shepherd[1]: Service term-tty2 has been started. 
Mar 29 15:49:28 localhost shepherd[1]: Respawning term-tty1. 
Mar 29 15:49:28 localhost shepherd[1]: Service term-tty1 has been started. 

[-- Attachment #1.3: system.scm --]
[-- Type: text/plain, Size: 2924 bytes --]

;; This is an operating system configuration template
;; for a "bare bones" setup, with no X11 display server.

(use-modules (gnu))
(use-service-modules networking
                     dbus
                     ssh
                     sysctl)
(use-package-modules certs
                     curl
                     ssh
                     rsync
                     tmux
                     version-control
                     vim)

(operating-system
  (host-name "computer")
  (timezone "America/New_York")
  (locale "en_US.UTF-8")

  (kernel-arguments
   '(;; Console resolution
     "gfxpayload=1440x900x16,1440x900"

     ;; console cursor. stops the blinking but the colors are bad
     "vt.cur.default=0x520032"

     "consoleblank=120"
     "quiet"))

  ;; Assuming /dev/sdX is the target hard disk, and "my-root" is
  ;; the label of the target root file system.
  (bootloader (grub-configuration (target "/dev/sda")
                                  (terminal-outputs '(console))))
  (file-systems (cons* (file-system
                        (device "my-root")
                        (title 'label)
                        (mount-point "/")
                        (type "ext4"))
                       (file-system
                        (device "home")
                        (title 'label)
                        (mount-point "/home")
                        (type "ext4"))
                      %base-file-systems))

  ;; This is where user accounts are specified.  The "root"
  ;; account is implicit, and is initially created with the
  ;; empty password.
  (users (cons (user-account
                (name "leo")
                (comment "")
                (group "users")

                ;; Adding the account to the "wheel" group
                ;; makes it a sudoer.  Adding it to "audio"
                ;; and "video" allows the user to play sound
                ;; and access the webcam.
                (supplementary-groups '("wheel" "netdev" "audio"))
                (home-directory (string-append "/home/" name)))
               %base-user-accounts))

  ;; Globally-installed packages.
  (packages (cons* curl
                   git
                   openssh
                   mosh
                   nss-certs
                   rsync
                   tmux
                   vim
                   %base-packages))

  (services
    (cons* (dbus-service)
           (gpm-service)
           (service openssh-service-type
                      (openssh-configuration
                        (password-authentication? #f)))
           (ntp-service)
           (wicd-service)
           (modify-services %base-services
             (guix-service-type config =>
                                (guix-configuration
                                  (inherit config)
                                  (substitute-urls '("https://berlin.guixsd.org https://mirror.hydra.gnu.org"))))))))

[-- Attachment #1.4: check.log --]
[-- Type: text/plain, Size: 27300 bytes --]

Compiling Scheme modules...
Running 1 system tests...
The following derivation will be built:
   /gnu/store/z4whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv
@ build-started /gnu/store/z4whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv - x86_64-linux /var/log/guix/drvs/z4//whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv.bz2
^[c^[[?7l^[[2J^[[0mSeaBIOS (version rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org)


iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+0FF91300+0FEF1300 C980
Press Ctrl-B to configure iPXE (PCI 00:03.0)...\r                                                                               


Booting from ROM..^[c^[[?7l^[[2J^[[0m.\r[    0.000000] Linux version 4.15.13-gnu (nixbld@) (gcc version 7.3.0 (GCC)) #1 SMP 1
[    0.000000] Command line: console=ttyS0 --root=/dev/vda1 --system=/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system --load=/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system/boot panic=1
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000ffddfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000ffde000-0x000000000fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] random: fast init done
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[    0.000000] e820: last_pfn = 0xffde max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.000000] found SMP MP-table at [mem 0x000f5b60-0x000f5b6f] mapped at [        (ptrval)]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] RAMDISK: [mem 0x0f512000-0x0ffcffff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F5960 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x000000000FFE15FC 000030 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x000000000FFE1458 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x000000000FFE0040 001418 (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACS 0x000000000FFE0000 000040
[    0.000000] ACPI: APIC 0x000000000FFE154C 000078 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 0x000000000FFE15C4 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000000ffddfff]
[    0.000000] NODE_DATA(0) allocated [mem 0x0ffda000-0x0ffddfff]
[    0.000000] tsc: Using PIT calibration value
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000000001000-0x000000000ffddfff]
[    0.000000]   Normal   empty
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000000ffddfff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000000ffddfff]
[    0.000000] Reserved but unavailable: 98 pages
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] e820: [mem 0x10000000-0xfffbffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] percpu: Embedded 45 pages/cpu @        (ptrval) s146072 r8192 d30056 u2097152
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64359
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: console=ttyS0 --root=/dev/vda1 --system=/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system --load=/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system/boot panic=1
[    0.000000] Memory: 223164K/261616K available (12300K kernel code, 1508K rwdata, 3840K rodata, 1760K init, 1164K bss, 38452K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 37456 entries in 147 pages
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 16640, nr_irqs: 256, preallocated irqs: 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] ACPI: Core revision 20170831
[    0.000000] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    0.008000] APIC: Switch to symmetric I/O mode setup
[    0.020000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.040000] tsc: Fast TSC calibration using PIT
[    0.044000] tsc: Detected 1862.112 MHz processor
[    0.044000] Calibrating delay loop (skipped), value calculated using timer frequency.. 3724.22 BogoMIPS (lpj=7448448)
[    0.044000] pid_max: default: 32768 minimum: 301
[    0.044000] Security Framework initialized
[    0.044000] Yama: becoming mindful.
[    0.053236] AppArmor: AppArmor initialized
[    0.059332] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.061386] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.064452] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[    0.066201] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[    0.098887] mce: CPU supports 10 MCE banks
[    0.102074] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.103491] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.104170] Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic retpoline
[    0.106545] Spectre V2 : Mitigation: Full generic retpoline
[    0.108109] Spectre V2 : Spectre v2 mitigation: Filling RSB on context switch
[    0.817807] Freeing SMP alternatives memory: 28K
[    0.864000] smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3)
[    0.873074] Performance Events: PMU not available due to virtualization, using software events only.
[    0.876559] Hierarchical SRCU implementation.
[    0.893597] NMI watchdog: Perf event create on CPU 0 failed with -2
[    0.895131] NMI watchdog: Perf NMI watchdog permanently disabled
[    0.901277] smp: Bringing up secondary CPUs ...
[    0.902307] smp: Brought up 1 node, 1 CPU
[    0.903445] smpboot: Max logical packages: 1
[    0.904070] smpboot: Total of 1 processors activated (3724.22 BogoMIPS)
[    0.932615] devtmpfs: initialized
[    0.940082] x86/mm: Memory block size: 128MB
[    0.947450] evm: security.selinux
[    0.948000] evm: security.SMACK64
[    0.948000] evm: security.SMACK64EXEC
[    0.948000] evm: security.SMACK64TRANSMUTE
[    0.948000] evm: security.SMACK64MMAP
[    0.948000] evm: security.apparmor
[    0.948000] evm: security.ima
[    0.948839] evm: security.capability
[    0.957213] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.960350] futex hash table entries: 256 (order: 2, 16384 bytes)
[    0.967828] pinctrl core: initialized pinctrl subsystem
[    0.976073] RTC time: 20:00:31, date: 03/29/18
[    0.984579] NET: Registered protocol family 16
[    0.990463] audit: initializing netlink subsys (disabled)
[    0.996741] audit: type=2000 audit(1522353629.992:1): state=initialized audit_enabled=0 res=1
[    1.005216] cpuidle: using governor ladder
[    1.006745] cpuidle: using governor menu
[    1.013354] ACPI: bus type PCI registered
[    1.014768] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    1.023534] PCI: Using configuration type 1 for base access
[    1.085592] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    1.099233] ACPI: Added _OSI(Module Device)
[    1.100108] ACPI: Added _OSI(Processor Device)
[    1.101286] ACPI: Added _OSI(3.0 _SCP Extensions)
[    1.102529] ACPI: Added _OSI(Processor Aggregator Device)
[    1.145775] ACPI: Interpreter enabled
[    1.148602] ACPI: (supports S0 S3 S4 S5)
[    1.149762] ACPI: Using IOAPIC for interrupt routing
[    1.152337] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    1.159380] ACPI: Enabled 2 GPEs in block 00 to 0F
[    1.253283] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    1.255573] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    1.256543] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    1.260981] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    1.278069] acpiphp: Slot [3] registered
[    1.279352] acpiphp: Slot [4] registered
[    1.280287] acpiphp: Slot [5] registered
[    1.281631] acpiphp: Slot [6] registered
[    1.282983] acpiphp: Slot [7] registered
[    1.284367] acpiphp: Slot [8] registered
[    1.285764] acpiphp: Slot [9] registered
[    1.287156] acpiphp: Slot [10] registered
[    1.288368] acpiphp: Slot [11] registered
[    1.289795] acpiphp: Slot [12] registered
[    1.292375] acpiphp: Slot [13] registered
[    1.293745] acpiphp: Slot [14] registered
[    1.295116] acpiphp: Slot [15] registered
[    1.296403] acpiphp: Slot [16] registered
[    1.297771] acpiphp: Slot [17] registered
[    1.299145] acpiphp: Slot [18] registered
[    1.300370] acpiphp: Slot [19] registered
[    1.301737] acpiphp: Slot [20] registered
[    1.303094] acpiphp: Slot [21] registered
[    1.304407] acpiphp: Slot [22] registered
[    1.305784] acpiphp: Slot [23] registered
[    1.308245] acpiphp: Slot [24] registered
[    1.309614] acpiphp: Slot [25] registered
[    1.311029] acpiphp: Slot [26] registered
[    1.311661] acpiphp: Slot [27] registered
[    1.312438] acpiphp: Slot [28] registered
[    1.313040] acpiphp: Slot [29] registered
[    1.313628] acpiphp: Slot [30] registered
[    1.314215] acpiphp: Slot [31] registered
[    1.315056] PCI host bridge to bus 0000:00
[    1.315674] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    1.316110] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    1.316627] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    1.317171] pci_bus 0000:00: root bus resource [mem 0x10000000-0xfebfffff window]
[    1.317711] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window]
[    1.318509] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.342323] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    1.344151] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    1.345931] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    1.348113] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    1.354756] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    1.356162] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
[    1.491413] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    1.492368] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    1.494362] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    1.496000] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    1.496000] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    1.507649] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[    1.508000] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    1.508000] pci 0000:00:02.0: vgaarb: bridge control possible
[    1.508000] vgaarb: loaded
[    1.513997] SCSI subsystem initialized
[    1.517130] ACPI: bus type USB registered
[    1.518474] usbcore: registered new interface driver usbfs
[    1.519502] usbcore: registered new interface driver hub
[    1.520242] usbcore: registered new device driver usb
[    1.523103] EDAC MC: Ver: 3.0.0
[    1.529480] PCI: Using ACPI for IRQ routing
[    1.545043] NetLabel: Initializing
[    1.545397] NetLabel:  domain hash size = 128
[    1.545746] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    1.549527] NetLabel:  unlabeled traffic allowed by default
[    1.553776] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    1.554817] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    1.555416] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    1.564658] clocksource: Switched to clocksource hpet
[    1.853027] VFS: Disk quotas dquot_6.6.0
[    1.853871] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.863285] AppArmor: AppArmor Filesystem Enabled
[    1.864157] pnp: PnP ACPI init
[    1.878205] pnp: PnP ACPI: found 6 devices
[    1.927881] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.932669] NET: Registered protocol family 2
[    1.943514] TCP established hash table entries: 2048 (order: 2, 16384 bytes)
[    1.944165] TCP bind hash table entries: 2048 (order: 3, 32768 bytes)
[    1.944165] TCP: Hash tables configured (established 2048 bind 2048)
[    1.948669] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    1.949486] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    1.953254] NET: Registered protocol family 1
[    1.954338] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.954932] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    1.955645] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    1.956854] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    1.968936] Trying to unpack rootfs image as initramfs...
[    3.798306] Freeing initrd memory: 11000K
[    3.802929] Scanning for low memory corruption every 60 seconds
[    3.815440] Initialise system trusted keyrings
[    3.818763] workingset: timestamp_bits=40 max_order=16 bucket_order=0
[    3.846004] zbud: loaded
[    3.862888] Allocating IMA blacklist keyring.
[    3.899156] Key type asymmetric registered
[    3.899667] Asymmetric key parser 'x509' registered
[    3.901802] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    3.903936] io scheduler noop registered
[    3.904665] io scheduler deadline registered (default)
[    3.906720] io scheduler cfq registered
[    3.917369] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    3.922238] ACPI: Power Button [PWRF]
[    3.931398] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    3.958457] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    4.013121] Linux agpgart interface v0.103
[    4.095282] brd: module loaded
[    4.127041] loop: module loaded
[    4.149868] scsi host0: ata_piix
[    4.154499] scsi host1: ata_piix
[    4.156069] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14
[    4.156847] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15
[    4.167823] libphy: Fixed MDIO Bus: probed
[    4.168156] tun: Universal TUN/TAP device driver, 1.6
[    4.170968] PPP generic driver version 2.4.2
[    4.173117] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    4.173808] ehci-pci: EHCI PCI platform driver
[    4.174631] ehci-platform: EHCI generic platform driver
[    4.175286] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    4.175883] ohci-pci: OHCI PCI platform driver
[    4.177008] ohci-platform: OHCI generic platform driver
[    4.177680] uhci_hcd: USB Universal Host Controller Interface driver
[    4.181524] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    4.188465] serio: i8042 KBD port at 0x60,0x64 irq 1
[    4.189322] serio: i8042 AUX port at 0x60,0x64 irq 12
[    4.194312] mousedev: PS/2 mouse device common for all mice
[    4.201766] rtc_cmos 00:00: RTC can wake from S4
[    4.206020] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    4.210802] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    4.212653] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[    4.213502] i2c /dev entries driver
[    4.215174] device-mapper: uevent: version 1.0.3
[    4.217866] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[    4.219362] ledtrig-cpu: registered to indicate activity on CPUs
[    4.230168] NET: Registered protocol family 10
[    4.245127] Segment Routing with IPv6
[    4.246373] NET: Registered protocol family 17
[    4.247375] Key type dns_resolver registered
[    4.251796] sched_clock: Marking stable (4248154360, 0)->(4361042007, -112887647)
[    4.259188] registered taskstats version 1
[    4.259661] Loading compiled-in X.509 certificates
[    4.262246] zswap: loaded using pool lzo/zbud
[    4.285094] Key type big_key registered
[    4.289432] Key type trusted registered
[    4.293728] Key type encrypted registered
[    4.294241] AppArmor: AppArmor sha1 policy hashing enabled
[    4.295329] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    4.301130] evm: HMAC attrs: 0x1
[    4.306850]   Magic number: 10:245:44
[    4.309171] rtc_cmos 00:00: setting system clock to 2018-03-29 20:00:34 UTC (1522353634)
[    4.315004] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    4.315523] EDD information not available.
[    4.338845] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    4.342576] ata2.00: configured for MWDMA2
[    4.362669] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    4.389879] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    4.390735] cdrom: Uniform CD-ROM driver Revision: 3.20
[    4.399275] sr 1:0:0:0: Attached scsi generic sg0 type 5
[    4.474085] Freeing unused kernel memory: 1760K
[    4.474622] Write protecting the kernel read-only data: 18432k
[    4.481944] Freeing unused kernel memory: 2024K
[    4.488199] Freeing unused kernel memory: 256K
GC Warning: pthread_getattr_np or pthread_attr_getstack failed for main thread
GC Warning: Couldn't read /proc/stat
[    4.826412] tsc: Refined TSC clocksource calibration: 1849.226 MHz
[    4.827283] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x354f9f4e667, max_idle_ns: 881590762287 ns
[    5.861133] clocksource: Switched to clocksource tsc
Welcome, this is GNU's early boot Guile.
Use '--repl' for an initrd REPL.

loading kernel modules...
[    6.068166] usbcore: registered new interface driver usb-storage
[    6.114030] usbcore: registered new interface driver uas
[    6.229953] hidraw: raw HID events driver (C) Jiri Kosina
[    6.246509] usbcore: registered new interface driver usbhid
[    6.247023] usbhid: USB HID core driver
[    6.457581] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[    6.980292] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    7.459346] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    7.939955] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    8.419789] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    9.047289]  vda: vda1 vda2
[    9.199424] FS-Cache: Loaded
[    9.234168] 9pnet: Installing 9P2000 support
[    9.246603] 9p: Installing v9fs 9p2000 file system support
[    9.248614] FS-Cache: Netfs '9p' registered for caching
[    9.357039] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
loading '/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system/boot'...
making '/gnu/store/lli9bx5hq3hqb1d2vshm41sc2whxm36m-system' the current system...
setting up setuid programs in '/run/setuid-programs'...
populating /etc from /gnu/store/d1g43dj80wph83ag2zldiiz823gs4wz3-etc...
adding user 'root'...
adding group 'root'...
adding group 'wheel'...
adding group 'users'...
adding group 'nogroup'...
adding group 'tty'...
adding group 'dialout'...
adding group 'kmem'...
adding group 'input'...
adding group 'video'...
adding group 'audio'...
adding group 'netdev'...
adding group 'lp'...
adding group 'disk'...
adding group 'floppy'...
adding group 'cdrom'...
adding group 'tape'...
adding group 'kvm'...
adding group 'guixbuild'...
adding group 'sshd'...
usermod: no changes
adding user 'alice'...
[   20.354756] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[   20.361090] clocksource:                       'hpet' wd_now: 7bf2c9db wd_last: 778761fc mask: ffffffff
[   20.365593] clocksource:                       'tsc' cs_now: 9bcfd4001 cs_last: 9848ecebb mask: ffffffffffffffff
[   20.368620] tsc: Marking TSC unstable due to clocksource watchdog
[   20.370720] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[   20.373207] sched_clock: Marking unstable (20370599229, 13198)<-(20483512433, -112887647)
[   20.397680] clocksource: Switched to clocksource hpet
adding user 'nobody'...
adding user 'guixbuilder01'...
adding user 'guixbuilder02'...
adding user 'guixbuilder03'...
adding user 'guixbuilder04'...
adding user 'guixbuilder05'...
adding user 'guixbuilder06'...
adding user 'guixbuilder07'...
adding user 'guixbuilder08'...
adding user 'guixbuilder09'...
adding user 'guixbuilder10'...
adding user 'sshd'...
registering public key '/gnu/store/xzz0xn80jp48fzy5xz4k8nmwwz1badgc-guix-0.14.0-9.bdf0c64/share/guix/hydra.gnu.org.pub'...
[   28.866401] random: crng init done
ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519 
[   40.483832] shepherd[1]: Service root has been started.
[   45.306274] shepherd[1]: starting services...
[   45.347538] shepherd[1]: Service root-file-system has been started.
[   45.387508] shepherd[1]: Service user-file-systems has been started.
[   45.445802] shepherd[1]: Service file-system-/dev/pts has been started.
[   45.493677] shepherd[1]: Service file-system-/dev/shm has been started.
[   45.533174] shepherd[1]: Service file-systems has been started.
[   45.791813] shepherd[1]: waiting for udevd...
[   46.497694] shepherd[1]: waiting for udevd...
[   46.797757] udevd[264]: starting version 3.2.4
[   47.411495] udevd[264]: starting eudev-3.2.4
[   47.471732] udevd[264]: no sender credentials received, message ignored
[   49.356174] parport_pc 00:04: reported by Plug and Play ACPI
[   49.393292] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[   49.431225] Floppy drive(s): fd0 is 2.88M AMI BIOS
[   49.487746] FDC 0 is a S82078B
[   49.743631] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0
[   50.091114] input: PC Speaker as /devices/platform/pcspkr/input/input3
[   50.786448] Error: Driver 'pcspkr' is already registered, aborting...
[   51.535618] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
[   52.508577] kvm: Nested Virtualization enabled
[   52.913484] ppdev: user-space parallel port driver
[   53.089682] Error: Driver 'pcspkr' is already registered, aborting...
[   53.289733] shepherd[1]: Service udev has been started.
[   53.470334] shepherd[1]: Service urandom-seed has been started.
[   53.509739] shepherd[1]: Service user-processes has been started.
[   53.551190] shepherd[1]: Service host-name has been started.
[   53.705528] shepherd[1]: Service user-homes could not be started.
nscd: 289 monitoring file `/etc/hosts` (1)
nscd: 289 monitoring directory `/etc` (2)
nscd: 289 monitoring file `/etc/resolv.conf` (3)
nscd: 289 monitoring directory `/etc` (2)
nscd: 289 monitoring file `/etc/services` (4)
nscd: 289 monitoring directory `/etc` (2)
[   54.801392] shepherd[1]: Service nscd has been started.
[   54.902220] shepherd[1]: Service guix-daemon has been started.


This is the GNU system.  Welcome.
komputilo login: shepherd: Service ssh-daemon could not be started.
Backtrace:
           4 (primitive-load "/gnu/store/gzysid4gy412hnd0jhj72c3c1zb?")
In ice-9/eval.scm:
    619:8  3 (_ #f)
    619:8  2 (_ #(#(#<directory (guile-user) 7ce140> #<test-run?>) #))
   293:34  1 (_ #(#(#<directory (guile-user) 7ce140> #<test-run?>) #))
In unknown file:
           0 (scm-error misc-error #f "~A ~S" ("file didn't show?" ?) ?)

ERROR: In procedure scm-error:
file didn't show up "/var/run/sshd.pid"
QEMU runs as PID 4
connected to QEMU's monitor
read QEMU monitor prompt
connected to guest REPL
%%%% Starting test ssh-daemon  (Writing full log to "ssh-daemon.log")
marionette is ready
note: keeping build directory `/tmp/guix-build-openssh.drv-0'
builder for `/gnu/store/z4whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv' failed with exit code 1
@ build-failed /gnu/store/z4whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv - 1 builder for `/gnu/store/z4whl2ghvwwhyh323ihh13n1y8j6f6bq-openssh.drv' failed with exit code 1
TOTAL: 1
FAIL: /gnu/store/rs538fa2y5vqg76fcp1r5pyd6gcg29y0-openssh

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
make: *** [Makefile:5232: check-system] Error 1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-03-29 20:08 bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Leo Famulari
@ 2018-04-06  8:21 ` Ludovic Courtès
  2018-04-06 12:41   ` Leo Famulari
  2018-08-28  9:47 ` Ludovic Courtès
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2018-04-06  8:21 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

> Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
> killed almost immediately after it starts with signal 15. I confirmed
> the issue started with the Shepherd upgrade by bisecting our Git
> history.
>
> I can reproduce the issue from commit
> b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).
>
> One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
> created anymore. And if I create an empty PID file by hand, it is
> removed after trying to start the ssh-daemon service. Also, the sshd
> user's home '/var/run/sshd' does not exist, and is similarly removed if
> it does exist.

Weird.  On my laptop /var/run/sshd.pid exists and contains the right
PID.  /var/run/sshd does not exist, though.

When I run the config you posted in ‘guix system vm’, ‘ssh-daemon’ is
correctly started and I see a correct ssh.pid and /var/run/sshd exists
as well.

> I ran the OpenSSH system test `make check-system TESTS=openssh` and it
> failed when it could not find the PID file. It passed on another
> non-GuixSD machine. The failing machine is relatively slow and lacks
> KVM: a ThinkPad x200s.

FWIW on my x86_64 laptop, I’ve run it several times in a row (using
“guix build /gnu/store/…-openssh.drv --check”), and it always succeeds.

> I modified the sshd invocation to print some debug output ('-d -E
> /tmp/sshd.log') and this is what it shows:
>
> ------
> debug1: sshd version OpenSSH_7.6, OpenSSL 1.0.2o  27 Mar 2018
> debug1: private host key #0: ssh-rsa SHA256:REDACTED
> debug1: private host key #1: ssh-dss SHA256:REDACTED
> debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:REDACTED
> debug1: private host key #3: ssh-ed25519 SHA256:REDACTED
> debug1: rexec_argv[0]='/gnu/store/az7vib8gk16fybhshh5xpkljmgxyrs4k-openssh-7.6p1/sbin/sshd'
> debug1: rexec_argv[1]='-D'
> debug1: rexec_argv[2]='-d'
> debug1: rexec_argv[3]='-E'
> debug1: rexec_argv[4]='/tmp/sshd.log'
> debug1: rexec_argv[5]='-f'
> debug1: rexec_argv[6]='/gnu/store/miy7xg5j4fg3mn04mcl27awmcl6s97ss-sshd_config'
> debug1: Set /proc/self/oom_score_adj from 0 to -1000
> debug1: Bind to port 22 on 0.0.0.0.
> Server listening on 0.0.0.0 port 22.
> debug1: Bind to port 22 on ::.
> Server listening on :: port 22.
> Received signal 15; terminating.
> ------

Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
when a service is stopped, and otherwise it’s sent when we’re shutting
the system down.  But here?…

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-04-06  8:21 ` Ludovic Courtès
@ 2018-04-06 12:41   ` Leo Famulari
  2018-04-06 14:37     ` Ludovic Courtès
  2018-05-01 13:13     ` Martin Castillo
  0 siblings, 2 replies; 21+ messages in thread
From: Leo Famulari @ 2018-04-06 12:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30993

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Fri, Apr 06, 2018 at 10:21:09AM +0200, Ludovic Courtès wrote:
[...]
> > Server listening on 0.0.0.0 port 22.
> > debug1: Bind to port 22 on ::.
> > Server listening on :: port 22.
> > Received signal 15; terminating.
> > ------
> 
> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
> when a service is stopped, and otherwise it’s sent when we’re shutting
> the system down.  But here?…

Indeed, that is the question. Does anyone have advice for debugging?

I still have this issue and so I'm continuing to use Shepherd 0.3.2.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-04-06 12:41   ` Leo Famulari
@ 2018-04-06 14:37     ` Ludovic Courtès
  2018-05-01 13:13     ` Martin Castillo
  1 sibling, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2018-04-06 14:37 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

Leo Famulari <leo@famulari.name> skribis:

> On Fri, Apr 06, 2018 at 10:21:09AM +0200, Ludovic Courtès wrote:
> [...]
>> > Server listening on 0.0.0.0 port 22.
>> > debug1: Bind to port 22 on ::.
>> > Server listening on :: port 22.
>> > Received signal 15; terminating.
>> > ------
>> 
>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>> when a service is stopped, and otherwise it’s sent when we’re shutting
>> the system down.  But here?…
>
> Indeed, that is the question. Does anyone have advice for debugging?
>
> I still have this issue and so I'm continuing to use Shepherd 0.3.2.

For you the bug is 100% reproducible with 0.4.0 and never happens with
0.3.2?  Even in a ‘guix system vm’?

Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-04-06 12:41   ` Leo Famulari
  2018-04-06 14:37     ` Ludovic Courtès
@ 2018-05-01 13:13     ` Martin Castillo
  2018-05-01 20:43       ` Ludovic Courtès
  1 sibling, 1 reply; 21+ messages in thread
From: Martin Castillo @ 2018-05-01 13:13 UTC (permalink / raw)
  To: 30993

(resending because I forgot to CC the list)

On 06.04.2018 14:41, Leo Famulari wrote:
>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>> when a service is stopped, and otherwise it’s sent when we’re shutting
>> the system down.  But here?…
>
> Indeed, that is the question. Does anyone have advice for debugging?
>
> I still have this issue and so I'm continuing to use Shepherd 0.3.2.
>

Maybe one could somehow strace herd, or change the make-kill-destructor
to log every time it is being executed?

I'm not that fluent in guile so maybe someone else could do that?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-01 13:13     ` Martin Castillo
@ 2018-05-01 20:43       ` Ludovic Courtès
  2018-05-03 15:16         ` Martin Castillo
  0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2018-05-01 20:43 UTC (permalink / raw)
  To: Martin Castillo; +Cc: 30993

Martin Castillo <castilma@uni-bremen.de> skribis:

> (resending because I forgot to CC the list)
>
> On 06.04.2018 14:41, Leo Famulari wrote:
>>> Hmm, where could that SIGTERM come from?  ‘make-kill-destructor’ uses it
>>> when a service is stopped, and otherwise it’s sent when we’re shutting
>>> the system down.  But here?…
>>
>> Indeed, that is the question. Does anyone have advice for debugging?
>>
>> I still have this issue and so I'm continuing to use Shepherd 0.3.2.
>>
>
> Maybe one could somehow strace herd, or change the make-kill-destructor
> to log every time it is being executed?

‘herd status sshd’ displays the last time sshd was respawned, but the
info ‘herd’ receives actually includes the dates of all the respawns,
not just the last one.  Is that what you’re asking for?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-01 20:43       ` Ludovic Courtès
@ 2018-05-03 15:16         ` Martin Castillo
  2018-05-03 16:38           ` Leo Famulari
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Castillo @ 2018-05-03 15:16 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30993



On 01.05.2018 22:43, Ludovic Courtès wrote:
>> Maybe one could somehow strace herd, or change the make-kill-destructor
>> to log every time it is being executed?
> 
> ‘herd status sshd’ displays the last time sshd was respawned, but the
> info ‘herd’ receives actually includes the dates of all the respawns,
> not just the last one.  Is that what you’re asking for?
> 

My idea was that maybe make-kill-destructor is being called from
somewhere else. If this is being
logged, one could rule that out.

Another wild idea would be sshd killing itself for some reason. stracing
sshd would tell us, if that's the case. How would one do that? Does
shepherd provide some debugging functions?

Or does linux provide a way to log all sent signals so one could find
the sending process?

Martin

-- 
GPG: 7FDE 7190 2F73 2C50 236E  403D CC13 48F1 E644 08EC

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-03 15:16         ` Martin Castillo
@ 2018-05-03 16:38           ` Leo Famulari
  2018-05-04  2:01             ` Martin Castillo
  0 siblings, 1 reply; 21+ messages in thread
From: Leo Famulari @ 2018-05-03 16:38 UTC (permalink / raw)
  To: Martin Castillo; +Cc: 30993

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

On Thu, May 03, 2018 at 05:16:32PM +0200, Martin Castillo wrote:
> 
> 
> On 01.05.2018 22:43, Ludovic Courtès wrote:
> >> Maybe one could somehow strace herd, or change the make-kill-destructor
> >> to log every time it is being executed?
> > 
> > ‘herd status sshd’ displays the last time sshd was respawned, but the
> > info ‘herd’ receives actually includes the dates of all the respawns,
> > not just the last one.  Is that what you’re asking for?
> > 
> 
> My idea was that maybe make-kill-destructor is being called from
> somewhere else. If this is being
> logged, one could rule that out.
> 
> Another wild idea would be sshd killing itself for some reason. stracing
> sshd would tell us, if that's the case. How would one do that? Does
> shepherd provide some debugging functions?
> 
> Or does linux provide a way to log all sent signals so one could find
> the sending process?

I haven't had time to debug this yet, and Shepherd 0.3 still works.

Since nobody else can reproduce the bug, and since I expect OpenSSH to
be commonly used, I suspect some non-deterministic Guile mis-compilation
or filesystem corruption — the system in question is using ext4.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-03 16:38           ` Leo Famulari
@ 2018-05-04  2:01             ` Martin Castillo
  2018-05-06 19:50               ` Leo Famulari
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Castillo @ 2018-05-04  2:01 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-04  2:01             ` Martin Castillo
@ 2018-05-06 19:50               ` Leo Famulari
  2018-05-07 19:10                 ` Martin Castillo
  0 siblings, 1 reply; 21+ messages in thread
From: Leo Famulari @ 2018-05-06 19:50 UTC (permalink / raw)
  To: Martin Castillo; +Cc: 30993

[-- Attachment #1: Type: text/plain, Size: 220 bytes --]

On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.

Interesting. Did it ever work for you on that system?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-06 19:50               ` Leo Famulari
@ 2018-05-07 19:10                 ` Martin Castillo
  2018-07-19 13:15                   ` bug#32197: " Clément Lassieur
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Castillo @ 2018-05-07 19:10 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]



On 06.05.2018 21:50, Leo Famulari wrote:
> On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
>> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.
> 
> Interesting. Did it ever work for you on that system?
> 
> that system?
Do you mean shepherd 0.3? Yes. And once(or so) with shepherd 0.4.

I reported that here [0]. Some of the mentioned files needed small
changes for the current guix, but ssh works with all of them, strangely.

I attached my current configuration, where I need to start the daemon
manually (herd start ssh-daemon) after each boot.

Martin



[0]: https://lists.gnu.org/archive/html/help-guix/2018-01/msg00112.html

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: cur.scm --]
[-- Type: text/x-scheme; name="cur.scm", Size: 8993 bytes --]

;; This is an operating system configuration template
;; for a "desktop" setup without full-blown desktop
;; environments.
;; This is an operating system configuration template
;; for a "bare bones" setup, with no X11 display server.

;(use-modules (gnu))
(use-modules (gnu) (gnu system nss) (guix gexp))
(use-service-modules ssh)
(use-service-modules base desktop xorg)
(use-package-modules certs linux suckless wm)
;(use-package-modules admin)

(define cyborg-quirk
"Section \"InputClass\"
	Identifier \"Saitek Cyborg R.A.T.5 Mouse\"
	MatchIsPointer \"on\"
	MatchProduct \"Saitek Cyborg R.A.T.5 Mouse\"
	MatchVendor \"Saitek|SAITEK\"
	MatchDevicePath \"/dev/input/event*\"
	Option \"ButtonMapping\" \"1 2 3 4 5 6 7 2 9 4 5 12 0 0 0\"
	Option \"ZAxisMapping\" \"10 11\"
#	Option \"AutoReleaseButtons\" \"12 13 14 15\"
EndSection
")

(define xneo
"Section \"InputClass\"
	Identifier \"keyboard-all\"
	Option \"XkbLayout\" \"neo\"
	MatchIsKeyboard \"on\"
EndSection
")

(define %my-xorg-conf
	(xorg-configuration-file #:extra-config (list cyborg-quirk xneo)))

(define %my-startx
	(xorg-start-command #:configuration-file %my-xorg-conf))

(define %my-services
       ;; My very own list of services.
       (modify-services %desktop-services
         (guix-service-type config =>
                            (guix-configuration
                             (inherit config)
			     (extra-options '("-M4"))
                             (substitute-urls (cons* "https://berlin.guixsd.org"
                                                     "https://hydra.gnu.org"
                                                     (guix-configuration-substitute-urls config)))))
         (openssh-service-type config =>
                            (openssh-configuration
                             (inherit config)
			     (x11-forwarding? (openssh-configuration-x11-forwarding? config))))
	 (slim-service-type config =>
			    (slim-configuration
			      (inherit config)
			      (startx %my-startx)
			      (default-user "mcd")))))

(define (openssh-config-file config)
  "Return the sshd configuration file corresponding to CONFIG."
  (computed-file
   "sshd_config"
   #~(begin
       (use-modules (ice-9 match))
       (call-with-output-file #$output
         (lambda (port)
           (display "# Generated by 'openssh-service'.\n" port)
           (format port "Port ~a\n"
                   #$(number->string
                      (openssh-configuration-port-number config)))
           (format port "PermitRootLogin ~a\n"
                   #$(match (openssh-configuration-permit-root-login config)
                       (#t "yes")
                       (#f "no")
                       ('without-password "without-password")))
           (format port "PermitEmptyPasswords ~a\n"
                   #$(if (openssh-configuration-allow-empty-passwords? config)
                         "yes" "no"))
           (format port "PasswordAuthentication ~a\n"
                   #$(if (openssh-configuration-password-authentication? config)
                         "yes" "no"))
           (format port "PubkeyAuthentication ~a\n"
                   #$(if (openssh-configuration-public-key-authentication?
                          config)
                         "yes" "no"))
           (format port "X11Forwarding ~a\n"
                   #$(if (openssh-configuration-x11-forwarding? config)
                         "yes" "no"))
           (format port "PidFile ~a\n"
                   #$(openssh-configuration-pid-file config))
           (format port "ChallengeResponseAuthentication ~a\n"
                   #$(if (openssh-challenge-response-authentication? config)
                         "yes" "no"))
           (format port "UsePAM ~a\n"
                   #$(if (openssh-configuration-use-pam? config)
                         "yes" "no"))
           (format port "PrintLastLog ~a\n"
                   #$(if (openssh-configuration-print-last-log? config)
                         "yes" "no"))

           ;; Add '/etc/authorized_keys.d/%u', which we populate.
           (format port "AuthorizedKeysFile \
 .ssh/authorized_keys .ssh/authorized_keys2 /etc/ssh/authorized_keys.d/%u\n")

           (for-each (lambda (s) (format port "AcceptEnv ~a\n" s))
                     '#$(openssh-configuration-accepted-environment config))

;           (for-each
;            (match-lambda
;              ((name command) (format port "Subsystem\t~a\t~a\n" name command)))
;            '#$(openssh-configuration-subsystems config))
           #t)))))

(define (my-openssh-shepherd-service config)
  "Return a <shepherd-service> for openssh with CONFIG."

  (define openssh-command
    #~(list (string-append #$(openssh-configuration-openssh config) "/sbin/sshd")
            "-D" "-f" #$(openssh-config-file config) "-d" "-E" "/var/log/sshd.debug"))

  (list (shepherd-service
         (documentation "OpenSSH server.")
         (requirement '(syslogd))
         (provision '(ssh-daemon))
         (start #~(make-forkexec-constructor #$openssh-command
                                             #:pid-file "/var/run/sshd.pid"))
;                                             #:pid-file #$pid-file))
         (stop #~(make-kill-destructor))
         (auto-start? (openssh-auto-start? config)))))

(operating-system
  (host-name "komputilo")
  (timezone "Europe/Berlin")
  (locale "de_DE.UTF-8")
  (hosts-file (plain-file "hosts"
"127.0.0.1 localhost komputilo
::1       localhost komputilo

192.168.178.20 pi.fritz.box pi"))

  ;; Assuming /dev/sdX is the target hard disk, and "my-root" is
  ;; the label of the target root file system.
  (bootloader (bootloader-configuration (bootloader
                                          (bootloader (inherit grub-bootloader)
                                                      (installer #~(const #t))))))
;  (bootloader (bootloader-configuration (bootloader grub-bootloader)
;					(target "/dev/sda3")
				  ;(menu-entries '((menu-entry
				;		    (label "Grub Nixos")
				;		    (multiboot "(hd0)") ; still unsupported
				;                   )))
;					))
  (file-systems (cons (file-system
                        (device "guix-root")
                        (title 'label)
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))

  ;; This is where user accounts are specified.  The "root"
  ;; account is implicit, and is initially created with the
  ;; empty password.
  (users (cons (user-account
                (name "mcd")
                (comment "Martin Castillo")
                (group "users")

                ;; Adding the account to the "wheel" group
                ;; makes it a sudoer.  Adding it to "audio"
                ;; and "video" allows the user to play sound
                ;; and access the webcam.
                (supplementary-groups '("wheel" ; "plugdev" ? für mtp? ist aber undefiniert
                                        "audio" "video" "netdev")) ;workmanager"))
                (home-directory "/home/mcd"))
               %base-user-accounts))

  ;; Globally-installed packages.
  ;(packages (cons tcpdump %base-packages))
  ;; Add a bunch of window managers; we can choose one at
  ;; the log-in screen with F1.
  (packages (cons* dmenu ;window managers
                   nss-certs                      ;for HTTPS access
                   kbd-neo
                   %base-packages))

  ;; Add services to the baseline: a DHCP client and
  ;; an SSH server.
  (services (cons* ;(dhcp-client-service)
;                   (gpm-service) ;mouse in vt
                   (console-keymap-service (file-append kbd-neo "/share/keymaps/neo.map"));;
;                   (my-openssh-shepherd-service
;                            (openssh-configuration
;                              (accepted-environment '("COLORTERM"))
;                              (port-number 2222)))
                   (service openssh-service-type
                            (openssh-configuration
                              (accepted-environment '("COLORTERM"))
                              (port-number 2222)))

                   (simple-service 'store-my-config etc-service-type
                                   `(("config.scm" ,(local-file (assoc-ref
                                                                  (current-source-location)
                                                                  'filename)))))
                   (xfce-desktop-service)
                   %my-services))

  ;; Use the "desktop" services, which include the X11
  ;; log-in service, networking with Wicd, and more.
  ;(services %desktop-services)

  ;; Allow resolution of '.local' host names with mDNS.
  (name-service-switch %mdns-host-lookup-nss))

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#32197: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-05-07 19:10                 ` Martin Castillo
@ 2018-07-19 13:15                   ` Clément Lassieur
  2018-07-19 14:26                     ` Clément Lassieur
  2018-07-23 17:08                     ` Martin Castillo
  0 siblings, 2 replies; 21+ messages in thread
From: Clément Lassieur @ 2018-07-19 13:15 UTC (permalink / raw)
  To: Martin Castillo, Leo Famulari; +Cc: 30993, 32197

Heya,

Martin Castillo <castilma@uni-bremen.de> writes:

> On 06.05.2018 21:50, Leo Famulari wrote:
>> On Fri, May 04, 2018 at 04:01:52AM +0200, Martin Castillo wrote:
>>> Sorry,  I forgot to mention that I have the same problem. But I had it already with shepherd 0.3.
>> 
>> Interesting. Did it ever work for you on that system?
>> 
>> that system?
> Do you mean shepherd 0.3? Yes. And once(or so) with shepherd 0.4.
>
> I reported that here [0]. Some of the mentioned files needed small
> changes for the current guix, but ssh works with all of them, strangely.
>
> I attached my current configuration, where I need to start the daemon
> manually (herd start ssh-daemon) after each boot.

I don't think you're having the same bug.  Martin can manually start the
daemon, whereas Leo can't.  So Martin seems to have
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32197, for which a commit
has been pushed by Julien.

Martin, could you please test it?  (You just need to 'guix pull' and try
again.)  Leo, if you confirm my analysis, could you please unmerge the
bugs?

Thanks,
Clément

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#32197: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-07-19 13:15                   ` bug#32197: " Clément Lassieur
@ 2018-07-19 14:26                     ` Clément Lassieur
  2018-07-19 16:57                       ` Leo Famulari
  2018-07-23 17:08                     ` Martin Castillo
  1 sibling, 1 reply; 21+ messages in thread
From: Clément Lassieur @ 2018-07-19 14:26 UTC (permalink / raw)
  To: Martin Castillo, Leo Famulari; +Cc: 30993, 32197

Clément Lassieur <clement@lassieur.org> writes:

> Leo, if you confirm my analysis, could you please unmerge the bugs?

I did it, because Eric confirmed the fix.
Clément

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#32197: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-07-19 14:26                     ` Clément Lassieur
@ 2018-07-19 16:57                       ` Leo Famulari
  0 siblings, 0 replies; 21+ messages in thread
From: Leo Famulari @ 2018-07-19 16:57 UTC (permalink / raw)
  To: Clément Lassieur; +Cc: 32197, 30993

[-- Attachment #1: Type: text/plain, Size: 280 bytes --]

On Thu, Jul 19, 2018 at 04:26:59PM +0200, Clément Lassieur wrote:
> Clément Lassieur <clement@lassieur.org> writes:
> 
> > Leo, if you confirm my analysis, could you please unmerge the bugs?
> 
> I did it, because Eric confirmed the fix.

Thanks, sorry for the confusion!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#32197: bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-07-19 13:15                   ` bug#32197: " Clément Lassieur
  2018-07-19 14:26                     ` Clément Lassieur
@ 2018-07-23 17:08                     ` Martin Castillo
  1 sibling, 0 replies; 21+ messages in thread
From: Martin Castillo @ 2018-07-23 17:08 UTC (permalink / raw)
  To: Clément Lassieur, Leo Famulari; +Cc: 30993, 32197


[-- Attachment #1.1.1: Type: text/plain, Size: 533 bytes --]

On 19.07.2018 15:15, Clément Lassieur wrote:
> [...]
> Martin, could you please test it?  (You just need to 'guix pull' and try
> again.)  Leo, if you confirm my analysis, could you please unmerge the
> bugs?
>
> Thanks,
> Clément
>

It still does not work for me.

Attached are my guix version, dmesg|grep shepherd output, config.scm and
my qemu invocation (metal).

I run that system in qemu, but it is installed directly on my harddrive.

Martin

-- 
GPG: 7FDE 7190 2F73 2C50 236E  403D CC13 48F1 E644 08EC

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: config.scm --]
[-- Type: text/x-scheme; name="config.scm", Size: 5455 bytes --]

;; This is an operating system configuration template
;; for a "desktop" setup without full-blown desktop
;; environments.
;; This is an operating system configuration template
;; for a "bare bones" setup, with no X11 display server.

;(use-modules (gnu))
(use-modules (gnu) (gnu system nss))
(use-service-modules ssh)
(use-service-modules base desktop xorg)
(use-package-modules certs linux suckless wm)
;(use-package-modules admin)

(define cyborg-quirk
"Section \"InputClass\"
	Identifier \"Saitek Cyborg R.A.T.5 Mouse\"
	MatchIsPointer \"on\"
	MatchProduct \"Saitek Cyborg R.A.T.5 Mouse\"
	MatchVendor \"Saitek|SAITEK\"
	MatchDevicePath \"/dev/input/event*\"
	Option \"ButtonMapping\" \"1 2 3 4 5 6 7 2 9 4 5 12 0 0 0\"
	Option \"ZAxisMapping\" \"10 11\"
#	Option \"AutoReleaseButtons\" \"12 13 14 15\"
EndSection
")

(define xneo
"Section \"InputClass\"
	Identifier \"keyboard-all\"
	Option \"XkbLayout\" \"neo\"
	MatchIsKeyboard \"on\"
EndSection
")

(define %my-xorg-conf
	(xorg-configuration-file #:extra-config (list cyborg-quirk xneo)))

(define %my-startx
	(xorg-start-command #:configuration-file %my-xorg-conf))

(define %my-services
       ;; My very own list of services.
       (modify-services %desktop-services
         (guix-service-type config =>
                            (guix-configuration
                             (inherit config)
			     (extra-options '("-M4"))
                             (substitute-urls (cons* "https://berlin.guixsd.org"
                                                     "https://hydra.gnu.org"
                                                     (guix-configuration-substitute-urls config)))))
         (openssh-service-type config =>
                            (openssh-configuration
                             (inherit config)
			     (x11-forwarding? (openssh-configuration-x11-forwarding? config))))
	 (slim-service-type config =>
			    (slim-configuration
			      (inherit config)
			      (startx %my-startx)
			      (default-user "mcd")))))


(operating-system
  (host-name "komputilo")
  (timezone "Europe/Berlin")
  (locale "de_DE.UTF-8")
  (hosts-file (plain-file "hosts"
"127.0.0.1 localhost komputilo
::1       localhost komputilo

192.168.178.20 pi.fritz.box pi"))

  ;; Assuming /dev/sdX is the target hard disk, and "my-root" is
  ;; the label of the target root file system.
  (bootloader (bootloader-configuration (bootloader
                                          (bootloader (inherit grub-bootloader)
                                                      (installer #~(const #t))))))
;  (bootloader (bootloader-configuration (bootloader grub-bootloader)
;					(target "/dev/sda3")
				  ;(menu-entries '((menu-entry
				;		    (label "Grub Nixos")
				;		    (multiboot "(hd0)") ; still unsupported
				;                   )))
;					))
  (file-systems (cons (file-system
                        (device (file-system-label "guix-root"))
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))

  ;; This is where user accounts are specified.  The "root"
  ;; account is implicit, and is initially created with the
  ;; empty password.
  (users (cons (user-account
                (name "mcd")
                (comment "Martin Castillo")
                (group "users")

                ;; Adding the account to the "wheel" group
                ;; makes it a sudoer.  Adding it to "audio"
                ;; and "video" allows the user to play sound
                ;; and access the webcam.
                (supplementary-groups '("wheel" ; "plugdev" ? für mtp? ist aber undefiniert
                                        "audio" "video" "netdev")) ;workmanager"))
                (home-directory "/home/mcd"))
               %base-user-accounts))

  ;; Globally-installed packages.
  ;(packages (cons tcpdump %base-packages))
  ;; Add a bunch of window managers; we can choose one at
  ;; the log-in screen with F1.
  (packages (cons* dmenu ;window managers
                   nss-certs                      ;for HTTPS access
                   kbd-neo
                   %base-packages))

  ;; Add services to the baseline: a DHCP client and
  ;; an SSH server.
  (services (cons* ;(dhcp-client-service)
;                   (gpm-service) ;mouse in vt
                   (console-keymap-service (file-append kbd-neo "/share/keymaps/neo.map"));;
                   (service openssh-service-type
                            (openssh-configuration
                              (x11-forwarding? #t)
                              (accepted-environment '("COLORTERM" "Foo"))
                              (port-number 22)))

                   (simple-service 'store-my-config etc-service-type
                                   `(("config.scm" ,(local-file (assoc-ref
                                                                  (current-source-location)
                                                                  'filename)))))
                   (xfce-desktop-service)
                   %my-services))

  ;; Use the "desktop" services, which include the X11
  ;; log-in service, networking with Wicd, and more.
  ;(services %desktop-services)

  ;; Allow resolution of '.local' host names with mDNS.
  (name-service-switch %mdns-host-lookup-nss))

[-- Attachment #1.1.3: shepherd.dmesg --]
[-- Type: text/plain, Size: 2079 bytes --]

[   11.764507] shepherd[1]: Service root has been started.
[   14.719803] shepherd[1]: starting services...
[   14.722728] shepherd[1]: Service root-file-system has been started.
[   14.725482] shepherd[1]: Service user-file-systems has been started.
[   14.729274] shepherd[1]: Service file-system-/dev/pts has been started.
[   14.732863] shepherd[1]: Service file-system-/dev/shm has been started.
[   14.736110] shepherd[1]: Service file-system-/gnu/store has been started.
[   14.750763] shepherd[1]: Service file-system-/run/systemd has been started.
[   14.757029] shepherd[1]: Service file-system-/run/user has been started.
[   14.760723] shepherd[1]: Service file-system-/sys/fs/cgroup has been started.
[   14.765708] shepherd[1]: Service file-system-/sys/fs/cgroup/elogind has been started.
[   14.776834] shepherd[1]: Service file-system-/sys/fs/cgroup/cpuset has been started.
[   14.790800] shepherd[1]: Service file-system-/sys/fs/cgroup/cpu has been started.
[   14.795139] shepherd[1]: Service file-system-/sys/fs/cgroup/cpuacct has been started.
[   14.810192] shepherd[1]: Service file-system-/sys/fs/cgroup/memory has been started.
[   14.814319] shepherd[1]: Service file-system-/sys/fs/cgroup/devices has been started.
[   14.827923] shepherd[1]: Service file-system-/sys/fs/cgroup/freezer has been started.
[   14.834046] shepherd[1]: Service file-system-/sys/fs/cgroup/blkio has been started.
[   14.838283] shepherd[1]: Service file-system-/sys/fs/cgroup/perf_event has been started.
[   14.841524] shepherd[1]: Service file-systems has been started.
[   14.966786] shepherd[1]: waiting for udevd...
[   15.627793] shepherd[1]: Service udev has been started.
[   15.679916] shepherd[1]: Service urandom-seed has been started.
[   15.684068] shepherd[1]: Service user-processes has been started.
[   15.688369] shepherd[1]: Service host-name has been started.
[   15.719811] shepherd[1]: Service user-homes could not be started.
[   16.737051] shepherd[1]: Service nscd has been started.
[   16.780356] shepherd[1]: Service guix-daemon has been started.

[-- Attachment #1.1.4: version --]
[-- Type: text/plain, Size: 292 bytes --]

guix (GNU Guix) 264967c883d32606c18b378f717c303e7490c942
Copyright (C) 2018 the Guix authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[-- Attachment #1.1.5: metal --]
[-- Type: text/plain, Size: 555 bytes --]

# runs qemu as mcd but with read and write access to sda

mount |egrep -q 'sd(a|b)3'  && echo guix-root is maybe mounted. Aborting. && exit

set -v
sudo sh -c 'exec sudo -u mcd -C 6 sh -c "
  exec qemu-system-x86_64 -m 1800 -smp 2 -enable-kvm \
		-net nic,model=virtio \
		-net user,hostfwd=tcp::5560-:2222,hostfwd=tcp::5559-:22 \
		-add-fd fd=5,set=2,opaque=rdwr:$(readlink -f /dev/disk/by-id/ata-Hitachi_HDT721010SLA360_STF6L7MS20ALEK) \
		-drive file=/dev/fdset/2,index=0,media=disk" \
		5<>/dev/disk/by-id/ata-Hitachi_HDT721010SLA360_STF6L7MS20ALEK '

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-03-29 20:08 bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Leo Famulari
  2018-04-06  8:21 ` Ludovic Courtès
@ 2018-08-28  9:47 ` Ludovic Courtès
  2018-11-17  9:46 ` bug#30993: issue: ssh-daemon could not be started Taegil Bae
  2019-05-14 13:33 ` bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Ludovic Courtès
  3 siblings, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2018-08-28  9:47 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

> Since the update to Shepherd 0.4.0, I've found that OpenSSH's sshd is
> killed almost immediately after it starts with signal 15. I confirmed
> the issue started with the Shepherd upgrade by bisecting our Git
> history.
>
> I can reproduce the issue from commit
> b6beda1d6b9093a8493b5c3cde33ed522242c451 (gnu: Add botan.).

I’m “happy” to say that I experienced this on a server—not having ssh
access to a remote server is fairly annoying, I definitely sympathize…

What I see is this:

--8<---------------cut here---------------start------------->8---
Aug  6 07:56:40 localhost shepherd[1]: Service loopback has been started. 

[...]

Aug  6 07:56:51 localhost sshd[606]: Server listening on 0.0.0.0 port 22.

[...]

Aug  6 07:57:05 localhost shepherd[1]: Service ssh-daemon could not be started. 

[...]

Aug  6 07:57:46 localhost vmunix: [   10.049791] random: ssh-keygen: uninitialized urandom read (32 bytes read)
--8<---------------cut here---------------end--------------->8---

(Note that the last message was pulled from /dev/kmsg by syslogd, but
it’s about an event that actually occurred before the first message.)

It waited for ~15 seconds, although ‘%pid-file-timeout’ in (shepherd
service) is only 5 seconds.

The SIGTERM you were seeing very likely comes from this bit:

--8<---------------cut here---------------start------------->8---
      (match (read-pid-file pid-file
                            #:max-delay pid-file-timeout)
        (#f
         (catch-system-error (kill pid SIGTERM))
         #f)
        ((? integer? pid)
         pid))
--8<---------------cut here---------------end--------------->8---

On another machine:

--8<---------------cut here---------------start------------->8---
Aug 28 09:10:49 localhost sshd[435]: Server listening on 0.0.0.0 port 22.
Aug 28 09:10:49 localhost sshd[435]: Server listening on :: port 22.

[...]

Aug 28 09:10:50 localhost shepherd[1]: Service ssh-daemon has been started. 
--8<---------------cut here---------------end--------------->8---

I wonder if this has to do with IPv6 (the failing case lacks the “Server
listening on ::” line), or if it’s just sshd occasionally taking a long
time to start.

Is it easily reproducible for you?  Did you eventually gather more
details?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: issue: ssh-daemon could not be started
  2018-03-29 20:08 bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Leo Famulari
  2018-04-06  8:21 ` Ludovic Courtès
  2018-08-28  9:47 ` Ludovic Courtès
@ 2018-11-17  9:46 ` Taegil Bae
  2018-11-19 21:22   ` Ludovic Courtès
  2019-05-14 13:33 ` bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Ludovic Courtès
  3 siblings, 1 reply; 21+ messages in thread
From: Taegil Bae @ 2018-11-17  9:46 UTC (permalink / raw)
  To: 30993

Hi,

I have experienced this issue on a slow machine (Thinkpad T60). By
placing avahi-daemon after ssh-daemon, I fixed this issue. I added
ssh-daemon into the requirement of avahi-shepherd-service, and
reconfigured the system.

On the machine, openssh and dropbear both had this issue. I checked
that this trick works for the two ssh server. On another machine(Qemu
VM, faster), openssh only had the trouble. Through several
reconfiguring the system, it was fixed with the original
configuration. I think that it was because the reconfigurations
reordered the services.

Taegil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: issue: ssh-daemon could not be started
  2018-11-17  9:46 ` bug#30993: issue: ssh-daemon could not be started Taegil Bae
@ 2018-11-19 21:22   ` Ludovic Courtès
  2018-11-20  1:33     ` Taegil Bae
  0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2018-11-19 21:22 UTC (permalink / raw)
  To: Taegil Bae; +Cc: 30993

Hello,

Taegil Bae <esrevinu@gmail.com> skribis:

> I have experienced this issue on a slow machine (Thinkpad T60). By
> placing avahi-daemon after ssh-daemon, I fixed this issue. I added
> ssh-daemon into the requirement of avahi-shepherd-service, and
> reconfigured the system.

This is very surprising.  Are you sure this is fully reproducible?
How would you explain this?

Thank you,
Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: issue: ssh-daemon could not be started
  2018-11-19 21:22   ` Ludovic Courtès
@ 2018-11-20  1:33     ` Taegil Bae
  0 siblings, 0 replies; 21+ messages in thread
From: Taegil Bae @ 2018-11-20  1:33 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30993

Hello,

On 11/20/18 6:22 AM, Ludovic Courtès wrote:
> Taegil Bae <esrevinu@gmail.com> skribis:
>
>> I have experienced this issue on a slow machine (Thinkpad T60). By
>> placing avahi-daemon after ssh-daemon, I fixed this issue. I added
>> ssh-daemon into the requirement of avahi-shepherd-service, and
>> reconfigured the system.
> This is very surprising.  Are you sure this is fully reproducible?
> How would you explain this?

At least in my machine it seems reproducible. I just reproduced that: 
reconfigured the system without my modification of avahi-daemon, checked 
ssh-daemon not started, reconfigured the system again with the 
modification, and then checked ssh-daemon started.

I am not a professional. But I guess that avahi-daemon manipulates 
network things such as the host name, and ssh-daemon waits for that to 
be completed. Look at this failing case:

Nov 20 09:37:57 localhost avahi-daemon[344]: Found user 'avahi' (UID 
985) and group 'avahi' (GID 973).
Nov 20 09:37:59 localhost avahi-daemon[344]: Successfully dropped root 
privileges.
Nov 20 09:38:00 localhost avahi-daemon[344]: avahi-daemon 0.7 starting up.
Nov 20 09:38:18 localhost shepherd[1]: Service avahi-daemon has been 
started.
Nov 20 09:38:01 localhost avahi-daemon[344]: WARNING: No NSS support for 
mDNS detected, consider installing nss-mdns!
Nov 20 09:38:05 localhost avahi-daemon[344]: Successfully called chroot().
Nov 20 09:38:12 localhost avahi-daemon[344]: Successfully dropped 
remaining capabilities.
Nov 20 09:38:32 localhost shepherd[1]: Service ssh-daemon could not be 
started.
Nov 20 09:38:15 localhost avahi-daemon[344]: Loading service file 
/services/sftp-ssh.service.
Nov 20 09:38:17 localhost avahi-daemon[344]: Loading service file 
/services/ssh.service.
Nov 20 09:38:19 localhost avahi-daemon[344]: Network interface 
enumeration completed.
Nov 20 09:38:22 localhost avahi-daemon[344]: Server startup complete. 
Host name is gravity.local. Local service cookie is 4157419020.
Nov 20 09:38:29 localhost avahi-daemon[344]: Service "gravity" 
(/services/ssh.service) successfully established.
Nov 20 09:38:32 localhost avahi-daemon[344]: Service "gravity" 
(/services/sftp-ssh.service) successfully established.
Nov 20 09:38:36 localhost avahi-daemon[344]: write() failed while 
writing return value to pipe: Broken pipe
Nov 20 09:39:12 localhost avahi-daemon[344]: Joining mDNS multicast 
group on interface wls3.IPv6 with address fe80::4c7d:9233:7845:eb88.
Nov 20 09:39:12 localhost avahi-daemon[344]: New relevant interface 
wls3.IPv6 for mDNS.
Nov 20 09:39:12 localhost avahi-daemon[344]: Registering new address 
record for fe80::4c7d:9233:7845:eb88 on wls3.*.
Nov 20 09:39:12 localhost avahi-daemon[344]: Joining mDNS multicast 
group on interface wls3.IPv4 with address 192.168.42.242.
Nov 20 09:39:12 localhost avahi-daemon[344]: New relevant interface 
wls3.IPv4 for mDNS.
Nov 20 09:39:12 localhost avahi-daemon[344]: Registering new address 
record for 192.168.42.242 on wls3.IPv4.

Regards,

Taegil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2018-03-29 20:08 bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Leo Famulari
                   ` (2 preceding siblings ...)
  2018-11-17  9:46 ` bug#30993: issue: ssh-daemon could not be started Taegil Bae
@ 2019-05-14 13:33 ` Ludovic Courtès
  2019-05-14 18:21   ` Leo Famulari
  3 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2019-05-14 13:33 UTC (permalink / raw)
  To: Leo Famulari; +Cc: 30993

Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

> One interesting tidbit is that the PID file '/var/run/sshd.pid' is not
> created anymore. And if I create an empty PID file by hand, it is
> removed after trying to start the ssh-daemon service. Also, the sshd
> user's home '/var/run/sshd' does not exist, and is similarly removed if
> it does exist.

There are reasons to believe that this issue is fixed by the Shepherd 0.6.1:

  https://issues.guix.info/issue/35550

Could you check somehow if the bug shows up again?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#30993: OpenSSH sshd killed by Shepherd 0.4.0
  2019-05-14 13:33 ` bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Ludovic Courtès
@ 2019-05-14 18:21   ` Leo Famulari
  0 siblings, 0 replies; 21+ messages in thread
From: Leo Famulari @ 2019-05-14 18:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30993-done

[-- Attachment #1: Type: text/plain, Size: 904 bytes --]

On Tue, May 14, 2019 at 03:33:59PM +0200, Ludovic Courtès wrote:
> There are reasons to believe that this issue is fixed by the Shepherd 0.6.1:
> 
>   https://issues.guix.info/issue/35550
> 
> Could you check somehow if the bug shows up again?

The bug disappeared for me a couple of reboots after reinstalling the
Guix System on my affected machine. That reinstallation provided
Shepherd 0.5, although 0.5 was also tested unsuccessfully before
reinstalling.

The issue does not manifest for me with Shepherd 0.6, 0.6.1 or Guix 1.0.

So... I think the bug was probably some kind of race condition or TOCTOU
problem that went away with a less fragmented or full filesystem (I was
really pushing the limits in that regard).

It's great that Shepherd 0.6.1 improved the PID file handling, assuming
that was the culprit.

I am closing this bug, but we can reopen it later if necessary.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-05-14 18:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-29 20:08 bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Leo Famulari
2018-04-06  8:21 ` Ludovic Courtès
2018-04-06 12:41   ` Leo Famulari
2018-04-06 14:37     ` Ludovic Courtès
2018-05-01 13:13     ` Martin Castillo
2018-05-01 20:43       ` Ludovic Courtès
2018-05-03 15:16         ` Martin Castillo
2018-05-03 16:38           ` Leo Famulari
2018-05-04  2:01             ` Martin Castillo
2018-05-06 19:50               ` Leo Famulari
2018-05-07 19:10                 ` Martin Castillo
2018-07-19 13:15                   ` bug#32197: " Clément Lassieur
2018-07-19 14:26                     ` Clément Lassieur
2018-07-19 16:57                       ` Leo Famulari
2018-07-23 17:08                     ` Martin Castillo
2018-08-28  9:47 ` Ludovic Courtès
2018-11-17  9:46 ` bug#30993: issue: ssh-daemon could not be started Taegil Bae
2018-11-19 21:22   ` Ludovic Courtès
2018-11-20  1:33     ` Taegil Bae
2019-05-14 13:33 ` bug#30993: OpenSSH sshd killed by Shepherd 0.4.0 Ludovic Courtès
2019-05-14 18:21   ` Leo Famulari

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).