unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#56209: Shepherd 0.9 not cleanly unmounting root
@ 2022-06-25  5:27 angry rectangle
  2022-06-25  5:42 ` bug#56209: angry rectangle
  2022-06-27 21:50 ` bug#56209: Shepherd 0.9 not cleanly unmounting root Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: angry rectangle @ 2022-06-25  5:27 UTC (permalink / raw)
  To: 56209

Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
I assume this means the filesystem wasn't cleanly unmounted.
I am doing a proper shutdown, using either "reboot" or "halt." 

I've attached the minimal config I've been using.
It's nothing special other than encrypted root.
I'm using an SSD with a gpt partition table.
No custom packages or external channels were used when configuring the system.

This is for my desktop computer, but I have the exact same problem with a similar minimal config on my laptop.
Mostly the same sitution there with an SSD, gpt table, and encrypted root.

The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Thanks




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#56209:
  2022-06-25  5:27 bug#56209: Shepherd 0.9 not cleanly unmounting root angry rectangle
@ 2022-06-25  5:42 ` angry rectangle
  2022-06-27 21:50 ` bug#56209: Shepherd 0.9 not cleanly unmounting root Ludovic Courtès
  1 sibling, 0 replies; 6+ messages in thread
From: angry rectangle @ 2022-06-25  5:42 UTC (permalink / raw)
  To: 56209

[-- Attachment #1: Type: text/plain, Size: 23 bytes --]

forgot the attachment.

[-- Attachment #2: guix system config --]
[-- Type: text/plain, Size: 1214 bytes --]

(use-modules (gnu) (gnu system) (gnu bootloader grub) (gnu packages dns))
(use-package-modules linux certs)
(use-service-modules networking)

(operating-system
 (kernel linux-libre)
 
 (host-name "myhostname")
 (timezone "America/New_York")
 (locale "en_US.utf8")

 (bootloader (bootloader-configuration
              (bootloader grub-bootloader)
              (target #f)))

 (mapped-devices
  (list (mapped-device
         (source (uuid "my--uuid--goes--here"))
         (target "cryptroot")
         (type luks-device-mapping))))
 (file-systems
  (append
   (list (file-system
          (device "/dev/mapper/cryptroot")
          (mount-point "/")
          (type "ext4")
          (dependencies mapped-devices)))
   %base-file-systems))
 
 (sudoers-file (plain-file "sudoers" "\
root ALL=(ALL) ALL
%wheel ALL=(ALL) NOPASSWD:ALL\n"))
 (users (cons (user-account
               (name "angry")
               (group "users")
               (password (crypt "a" "$6$abc"))
               (supplementary-groups '("wheel" "audio" "video")))
              %base-user-accounts))
 (packages (cons nss-certs %base-packages))
 (services (cons*
            (service dhcp-client-service-type)
            %base-services)))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#56209: Shepherd 0.9 not cleanly unmounting root
  2022-06-25  5:27 bug#56209: Shepherd 0.9 not cleanly unmounting root angry rectangle
  2022-06-25  5:42 ` bug#56209: angry rectangle
@ 2022-06-27 21:50 ` Ludovic Courtès
  2022-06-28 22:02   ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-06-27 21:50 UTC (permalink / raw)
  To: angry rectangle; +Cc: 56209

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

Hi,

angry rectangle <angryrectangle@cock.li> skribis:

> Since the upgrade to shepherd 0.9, I get "recovering journal" every single time I start my computer.
> To be specific, "recovering journal" appears after I enter my encryption password in the initrd.
> I assume this means the filesystem wasn't cleanly unmounted.
> I am doing a proper shutdown, using either "reboot" or "halt." 

I can see that as well.

> The guix commit 400c9ed3d779308e56038305d40cd93acb496180 is the specific commit that upgrades shepherd and causes me this problem. The previous commit is fine.
> I'm can confirm that it's still broken on recent commits. I'm on 696e2cc345f015c32f211bf0d0330c04b1cf5f15.

Preliminary investigation suggests this is because shepherd doesn’t
close log files beforehand (in 0.9, those specified as #:log-file to
‘make-forkexec-constructor’ & co. are opened by PID 1; conversely,
shepherd 0.8 would open them in the child process.)

To be continued…

Thanks for reporting the issue and finding the offending commit!

Ludo’.

PS: Below my (ugly) debugging tricks for posterity.  To see those
    messages, you typically need to start a VM with ‘-serial stdio’ and
    pass “console=ttyS0” to the kernel.  (It’s best to start a
    standalone VM with an image created by ‘guix system image -t
    qcow2’.)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 4488 bytes --]

diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..25d747d226 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -299,6 +299,9 @@ (define %root-file-system-shepherd-service
    (stop #~(lambda _
              ;; Return #f if successfully stopped.
              (sync)
+             (call-with-port (open-file "/dev/console" "w0")
+               (lambda (port)
+                 (display "This is my last message.\n" port)))
 
              (call-with-blocked-asyncs
               (lambda ()
@@ -314,11 +317,24 @@ (define %root-file-system-shepherd-service
                   ;; Close /dev/console.
                   (for-each close-fdes '(0 1 2))
 
-                  ;; At this point, there are no open files left, so the
-                  ;; root file system can be re-mounted read-only.
-                  (mount #f "/" #f
-                         (logior MS_REMOUNT MS_RDONLY)
-                         #:update-mtab? #f)
+                  (open-fdes "/dev/null" O_RDONLY)
+                  (open-fdes "/dev/console" O_WRONLY)
+                  (open-fdes "/dev/console" O_WRONLY)
+                  (current-output-port (fdopen 1 "w0"))
+                  (current-error-port (fdopen 2 "w0"))
+                  (pk 'umount-root)
+
+                  (catch 'system-error
+                    (lambda ()
+                      ;; At this point, there are no open files left, so the
+                      ;; root file system can be re-mounted read-only.
+                      (mount #f "/" #f
+                             (logior MS_REMOUNT MS_RDONLY)
+                             #:update-mtab? #f))
+                    (lambda args
+                      (pk 'umount-root-error args)
+                      #f))
+                  (pk 'done-umount-root)
 
                   #f)))))
    (respawn? #f)))
@@ -406,7 +422,28 @@ (define (file-system-shepherd-service file-system)
                       ;; Make sure PID 1 doesn't keep TARGET busy.
                       (chdir "/")
 
-                      (umount #$target)
+                      (call-with-port (open-file "/dev/console" "w0")
+                        (lambda (port)
+                          (parameterize ((current-output-port port)
+                                         (current-error-port port))
+                            (pk 'umount #$target)
+                            #$(if (file-system-mount-may-fail? file-system)
+                                  #~(catch 'system-error
+                                      (lambda ()
+                                        (umount #$target))
+                                      (const #f))
+                                  #~(catch 'system-error
+                                      (lambda ()
+                                        (umount #$target))
+                                      (lambda args
+                                        (pk 'umount-error args)
+                                        (system* #$(file-append (@ (gnu
+                                                                    packages
+                                                                    lsof)
+                                                                   lsof)
+                                                                "/bin/lsof"))
+                                        #f)))
+                            (pk 'done-umount #$target))))
                       #f))
 
             ;; We need additional modules.
diff --git a/gnu/system/examples/bare-bones.tmpl b/gnu/system/examples/bare-bones.tmpl
index 387e4b12ba..1f9012c167 100644
--- a/gnu/system/examples/bare-bones.tmpl
+++ b/gnu/system/examples/bare-bones.tmpl
@@ -2,8 +2,8 @@
 ;; for a "bare bones" setup, with no X11 display server.
 
 (use-modules (gnu))
-(use-service-modules networking ssh)
-(use-package-modules screen ssh)
+(use-service-modules networking ssh shepherd)
+(use-package-modules screen ssh admin)
 
 (operating-system
   (host-name "komputilo")
@@ -38,6 +38,13 @@
                                         "audio" "video")))
                %base-user-accounts))
 
+  (essential-services
+   (modify-services (operating-system-default-essential-services
+                     this-operating-system)
+     (shepherd-root-service-type
+      config => (shepherd-configuration
+                 (shepherd shepherd-0.8)))))
+
   ;; Globally-installed packages.
   (packages (cons screen %base-packages))
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#56209: Shepherd 0.9 not cleanly unmounting root
  2022-06-27 21:50 ` bug#56209: Shepherd 0.9 not cleanly unmounting root Ludovic Courtès
@ 2022-06-28 22:02   ` Ludovic Courtès
  2022-06-29  0:18     ` angry rectangle
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-06-28 22:02 UTC (permalink / raw)
  To: angry rectangle; +Cc: 56209

[-- Attachment #1: Type: text/plain, Size: 164 bytes --]

Hi,

I believe the attached patch fixes the problem.  I’ll do more testing on
my side but I’d be grateful if someone would give it a try too.

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3662 bytes --]

diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index 17b7b38a15..dea58354d9 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -328,7 +328,18 @@ (define-public shepherd-0.9
                                   version ".tar.gz"))
               (sha256
                (base32
-                "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))))
+                "0l2arn6gsyw88xk9phxnyplvv1mn8sqp3ipgyyb0nszdzvxlgd36"))
+              (modules '((guix build utils)))
+              (snippet
+               ;; Avoid continuation barriers so (@ (fibers) sleep) can be
+               ;; called from a service's 'stop' method
+               '(substitute* "modules/shepherd/service.scm"
+                  (("call-with-blocked-asyncs")   ;in 'stop' method
+                   "(lambda (thunk) (thunk))")
+                  (("\\(for-each-service\n")      ;in 'shutdown-services'
+                   "((lambda (proc)
+                       (for-each proc
+                                 (fold-services cons '())))\n")))))
     (arguments
      (list #:configure-flags #~'("--localstatedir=/var")
            #:make-flags #~'("GUILE_AUTO_COMPILE=0")
diff --git a/gnu/services/base.scm b/gnu/services/base.scm
index d58afb27e3..1fd4cd84f3 100644
--- a/gnu/services/base.scm
+++ b/gnu/services/base.scm
@@ -300,27 +300,36 @@ (define %root-file-system-shepherd-service
              ;; Return #f if successfully stopped.
              (sync)
 
-             (call-with-blocked-asyncs
-              (lambda ()
-                (let ((null (%make-void-port "w")))
-                  ;; Close 'shepherd.log'.
-                  (display "closing log\n")
-                  ((@ (shepherd comm) stop-logging))
+             (let ((null (%make-void-port "w")))
+               ;; Close 'shepherd.log'.
+               (display "closing log\n")
+               ((@ (shepherd comm) stop-logging))
 
-                  ;; Redirect the default output ports..
-                  (set-current-output-port null)
-                  (set-current-error-port null)
+               ;; Redirect the default output ports..
+               (set-current-output-port null)
+               (set-current-error-port null)
 
-                  ;; Close /dev/console.
-                  (for-each close-fdes '(0 1 2))
+               ;; Close /dev/console.
+               (for-each close-fdes '(0 1 2))
 
-                  ;; At this point, there are no open files left, so the
-                  ;; root file system can be re-mounted read-only.
-                  (mount #f "/" #f
-                         (logior MS_REMOUNT MS_RDONLY)
-                         #:update-mtab? #f)
+               (let loop ((n 10))
+                 (unless (catch 'system-error
+                           (lambda ()
+                             ;; At this point, there are no open files left, so the
+                             ;; root file system can be re-mounted read-only.
+                             (mount #f "/" #f
+                                    (logior MS_REMOUNT MS_RDONLY)
+                                    #:update-mtab? #f)
+                             #t)
+                           (const #f))
+                   (unless (zero? n)
+                     ;; Yield to the other fibers.  That gives logging fibers
+                     ;; an opportunity to close log files so the 'mount' call
+                     ;; doesn't fail with EBUSY.
+                     ((@ (fibers) sleep) 1)
+                     (loop (- n 1)))))
 
-                  #f)))))
+               #f)))
    (respawn? #f)))
 
 (define root-file-system-service-type

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#56209: Shepherd 0.9 not cleanly unmounting root
  2022-06-28 22:02   ` Ludovic Courtès
@ 2022-06-29  0:18     ` angry rectangle
  2022-07-01 10:29       ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: angry rectangle @ 2022-06-29  0:18 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 56209


I tried your patch on one of my computers and it works.
Thank you.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#56209: Shepherd 0.9 not cleanly unmounting root
  2022-06-29  0:18     ` angry rectangle
@ 2022-07-01 10:29       ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-07-01 10:29 UTC (permalink / raw)
  To: angry rectangle; +Cc: 56209-done

Hi,

angry rectangle <angryrectangle@cock.li> skribis:

> I tried your patch on one of my computers and it works.

Thanks for testing.

Pushed as 0483c71cc5aeb3b69f6deb154fe12c0b2e6dc17f.  The reason is took
me more time is that I wanted to have a system test for that to make
sure it doesn’t come back to haunt us in the future.  Now we should be
fine.  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-01 10:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-25  5:27 bug#56209: Shepherd 0.9 not cleanly unmounting root angry rectangle
2022-06-25  5:42 ` bug#56209: angry rectangle
2022-06-27 21:50 ` bug#56209: Shepherd 0.9 not cleanly unmounting root Ludovic Courtès
2022-06-28 22:02   ` Ludovic Courtès
2022-06-29  0:18     ` angry rectangle
2022-07-01 10:29       ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).