all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#73966: 'guix deploy' fails when adding 'elogind-service-type'
@ 2024-10-23 15:04 Fabio Natali via Bug reports for GNU Guix
  0 siblings, 0 replies; only message in thread
From: Fabio Natali via Bug reports for GNU Guix @ 2024-10-23 15:04 UTC (permalink / raw)
  To: 73966

Dear All,

This is to briefly report an issue that I encountered yesterday, while
reconfiguring one of my servers.

The machine is headless and has a minimalist system without
desktop-related software. Yesterday I had to install the elogind
service, so I add a one-liner to my configuration file and redeploy with
'guix deploy' - which fails badly.

'guix deploy' terminates with this error:

--8<---------------cut here---------------start------------->8---
guix deploy: error: failed to deploy host: remote command
'/run/setuid-programs/sudo -n --
/gnu/store/xv4cd7qz4yan93zkjisbmbpxfz78hah2-guile-3.0.9/bin/guile
--no-auto-compile -L /gnu/store/gai5i4ba2xf084big8h56q6pc0vwx2sj-module-import
-C / gnu/store/gai5i4ba2xf084big8h56q6pc0vwx2sj-module-import -c "(begin
(use-modules (guix repl)) (send-repl-response (quote (with-output-to-port
(current-error-por t) (lambda () (primitive-load
\"/gnu/store/gkh9yvyfdlnzpi9j9h8w4df0qz3jim2x-remo te-exp.scm\"))))
(current-output-port)) (force-output))"' failed with status 1
--8<---------------cut here---------------end--------------->8---

The system is left in a non-working state. Attempts at opening new
terminal sessions fail, with the user being logged out immediately, both
when connecting via SSH and when logging in from a TTY. Already
established terminal sessions start throwing errors like this:

--8<---------------cut here---------------start------------->8---
user@host ~$ sudo su -
sudo: pam_open_session: Error in service module
sudo: policy plugin failed session initialization
Segmentation fault
--8<---------------cut here---------------end--------------->8---

I have to use Magic SysRq to reboot and get back to a working system.

I initially blamed this to some quirk of this particular machine, but
then I was able to reproduce this in a VM. Here are the steps.

Start from a system definition 'server.scm', along the lines of:

--8<---------------cut here---------------start------------->8---
(use-modules (gnu)
             (gnu machine)
             (gnu machine ssh)
             (gnu services desktop)
             (gnu services networking)
             (gnu services ssh))

(define %user-authorized-key
  (plain-file
   "user-authorized-key.pub" "ssh-rsa SSH-PUBLIC-KEY"))

(define %guix-authorized-key
  (plain-file
   "guix-authorized-key.pub"
   "(public-key
(ecc (curve Ed25519) (q GUIX-AUTHORIZED-KEY)))"))

(define test-server-operating-system
  (operating-system
    (host-name "host")
    (bootloader (bootloader-configuration
                 (bootloader grub-bootloader)
                 (targets '("/dev/vda"))))
    (file-systems (cons
		   (file-system
                     (device "/dev/vda1")
                     (mount-point "/")
                     (type "ext4"))
                   %base-file-systems))

    (users
     (list (user-account
            (name "user")
            (group "users")
            (supplementary-groups '("wheel")))))

    (sudoers-file
     (plain-file
      "sudoers"
      (string-append
       (plain-file-content %sudoers-specification)
       "%wheel ALL = NOPASSWD: ALL")))

    (services
     (cons*
      (service dhcp-client-service-type)
      (service openssh-service-type
               (openssh-configuration
                (authorized-keys `(("user"
                                    ,%user-authorized-key)
                                   ("root"
                                    ,%user-authorized-key)))
                (permit-root-login 'prohibit-password)))
      ;; Enable the elogind service and redeploy to trigger the issue.
      ;; (service elogind-service-type)
      (modify-services
          %base-services
        (guix-service-type config =>
                           (guix-configuration
                            (authorized-keys
                             (cons %guix-authorized-key
                                   %default-authorized-guix-keys)))))))))

(define test-server-machine
  (machine-ssh-configuration
   (host-key "ssh-ed25519 MACHINE-PUBLIC-KEY")
   (host-name "localhost")
   (port 2222)
   (identity "/home/user/.ssh/id_rsa_guix_image")
   (system "x86_64-linux")
   (user "user")))

(list
 (machine
  (operating-system test-server-operating-system)
  (environment managed-host-environment-type)
  (configuration test-server-machine)))

test-server-operating-system
--8<---------------cut here---------------end--------------->8---

Create an image 'image.qcow2' based on the above definition:

--8<---------------cut here---------------start------------->8---
cp `guix system image \
    --image-size=20GB \
    --image-type=qcow2 \
    server.scm` image.qcow2
chmod u+w image.qcow2
--8<---------------cut here---------------end--------------->8---

The image can be run with this incantation or a variation thereof:

--8<---------------cut here---------------start------------->8---
guix shell qemu -- qemu-system-x86_64 \
    -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 \
    -enable-kvm -m 4096 -smp 2 \
    -device virtio-blk,drive=myhd \
    -drive if=none,file=/tmp/image.qcow2,id=myhd
--8<---------------cut here---------------end--------------->8---

Everything should look fine so far. Log in as a user, take note of the
SSH public key in '/etc/ssh/ssh_host_ed25519_key.pub' which needs to be
used as the 'host-key' in the 'machine-ssh-configuration' in
'server.scm'.

Now uncomment the line with 'elogind' service and comment out the last
line 'test-server-operating-system', so that the system definition can
be fed to 'guix deploy'.

Run 'guix deploy server.scm', while the QEMU machine is still running.

BOOM.

This should have triggered the error: the deploy fails and the VM is no
longer responsive.

Note that this is only triggered if the system definition includes a
non-root user. Also note that the reconfiguration succeeds when run from
within the machine, i.e. via 'guix system reconfigure ...' (from within
the guest) as opposed to 'guix deploy' (from the host).

I just wanted to brain-dump this here. It's not blocking for me at the
moment but I guess it's good to have it reported. Any idea is welcome.
I'll also try and get back to this when time permits.

Thanks, cheers, Fabio.


-- 
Fabio Natali
https://fabionatali.com




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-10-23 15:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 15:04 bug#73966: 'guix deploy' fails when adding 'elogind-service-type' Fabio Natali via Bug reports for GNU Guix

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.