unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
@ 2022-12-07 13:50 Ludovic Courtès
  2022-12-08 18:02 ` Mathieu Othacehe
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2022-12-07 13:50 UTC (permalink / raw)
  To: 59884; +Cc: Mathieu Othacehe

Hi,

The ‘gui-installed-desktop-os-encrypted’ test fails intermittently like
so:

--8<---------------cut here---------------start------------->8---
Dec  7 12:14:01 localhost installer[253]: ^M^[[Kguix system: ^[[1m^[[0mbootloader successfully installed on /dev/vda^M 
Dec  7 12:14:01 localhost installer[253]: command ("guix" "system" "init" "--fallback" "--no-grafts" "--no-substitutes" "/mnt/etc/config.scm" "/mnt") succeeded 
conversation expecting pattern ((quote installation-complete))
Dec  7 12:14:01 localhost shepherd[1]: Service guix-daemon has been stopped. 
Dec  7 12:14:01 localhost shepherd[1]: Service guix-daemon has been started. 
Dec  7 12:14:01 localhost installer[253]: crashing due to uncaught exception: system-error ("umount" "~S: ~A" ("/remove" "Device or resource busy") (16)) 
Dec  7 12:14:01 localhost installer[253]: running form #<newt-form aec220> ("Unexpected problem") with 1 clients 
Dec  7 12:14:01 localhost installer[253]: form #<newt-form aec220> ("Unexpected problem"): client 20 replied #t 
/gnu/store/2bdm6l1j7siky5vadr62l3ns2z8qz75a-shepherd-marionette.scm:1:1718: ERROR:
  1. &pattern-not-matched:
      pattern: ((quote installation-complete))
      sexp: (contents-dialog (title "Unexpected problem") (text "The installer has encountered an unexpected problem. The backtrace is displayed below. You may choose to exit or create a dump archive.") (content "In ./gnu/installer/steps.scm:\n   150:13 19 (run ((substitutes . #f) (network (select-technology . #<<technology> name: \"Wired\" type: \"ethernet\" powered?: #t connected?: #t>) (power-technology . #<unspecified>) (# . #<<???>) ???) ???) ???)\n   150:13 18 (run ((user #<<user> name: \"root\" real-name: \"\" group: \"users\" password: <secret> home-directory: \"/root\"> #<<user> name: \"alice\" real-name: \"Alice\" group: \"users\" password: <s???> ???) ???) ???)\n   150:13 17 (run ((services #<<system-service> name: \"GNOME\" type: desktop recommended?: #f snippet: ((service gnome-desktop-service-type)) packages: ()> #<<system-service> name: \"Xfce\" ty???> ???) ???) ???)\n   148:23 16 (run ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable?: #t esp?:???> ???) ???) ???)\nIn ./gnu/installer/newt/final.scm:\n   132:11 15 (run-final-page ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable???> ???) ???) ???)\n   105:21 14 (run-install-shell \"en_HK.utf8\" #:users _)\nIn ./gnu/installer/final.scm:\n    191:5 13 (install-system \"en_HK.utf8\" #:users _)\n   113:13 12 (call-with-mnt-container #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()>)\nIn gnu/build/linux-container.scm:\n   265:16 11 (run-container _ _ (mnt) _ #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()> #:guest-uid _ #:guest-gid _)\nIn ./gnu/installer/final.scm:\n   222:13 10 (_)\nIn gnu/build/install.scm:\n    290:5  9 (unmount-cow-store \"/mnt\" \"/tmp/guix-inst\")\nIn guix/build/syscalls.scm:\n   600:10  8 (_ _ _ #:update-mtab? _)\nIn ice-9/boot-9.scm:\n  1685:16  7 (raise-exception _ #:continuable? _)\n  1780:13  6 (_ #<&compound-exception components: (#<&external-error> #<&origin origin: \"umount\"> #<&message message: \"~S: ~A\"> #<&irritants irritants: (\"/remove\" \"Device or resource busy\")> #<&ex???>)\nIn ice-9/eval.scm:\n    619:8  5 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\n   626:19  4 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\nIn ./gnu/installer/dump.scm:\n     58:4  3 (prepare-dump system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16)) #:result _)\nIn ice-9/ports.scm:\n   433:17  2 (call-with-output-file _ _ #:binary _ #:encoding _)\nIn ./gnu/installer/dump.scm:\n    60:27  1 (_ #<output: installer-backtrace 7>)\nIn unknown file:\n           0 (make-stack #t)\n./gnu/installer/dump.scm:62:36: In procedure umount: \"/remove\": Device or resource busy\n"))
Dec  7 12:14:01 localhost installer[198]: unmounting "/mnt/" 

Backtrace:
           2 (primitive-load "/gnu/store/s6a8v2l4387bgsp0w0pcbg4wjsy?")
In ice-9/eval.scm:
   191:35  1 (_ #f)
    619:8  0 (_ #(#<directory (guile-user) 7fffeffcfc80> #<variabl?>))

ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
@ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
--8<---------------cut here---------------end--------------->8---

(From <https://ci.guix.gnu.org/build/224930/details>.)

Does that ring a bell, Mathieu?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
  2022-12-07 13:50 bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures Ludovic Courtès
@ 2022-12-08 18:02 ` Mathieu Othacehe
  2022-12-09 22:32   ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Othacehe @ 2022-12-08 18:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 59884


Hello,

> ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
> builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> @ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
>
> (From <https://ci.guix.gnu.org/build/224930/details>.)
>
> Does that ring a bell, Mathieu?

I spent days on that issue before. It used to show up on all installer
tests, and even on real hardware, then
8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.

The installation is now made in a container to make sure that we are
later on able to umount the store overlay even though some background
processes such as kmscon or udev opened files from the overlay.

Now the issue only shows up on that specific test and is intermittent as
you noticed.

To be honest, that was quite painful to debug and I'm a bit scared to
jump back in. I think I had the marionette produce some lsof reports
back then, or something like that. I very much regret not to have kept
notes somewhere.

Mathieu




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
  2022-12-08 18:02 ` Mathieu Othacehe
@ 2022-12-09 22:32   ` Ludovic Courtès
  2022-12-10 15:35     ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2022-12-09 22:32 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 59884

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

> I spent days on that issue before. It used to show up on all installer
> tests, and even on real hardware, then
> 8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.
>
> The installation is now made in a container to make sure that we are
> later on able to umount the store overlay even though some background
> processes such as kmscon or udev opened files from the overlay.

Right.

> Now the issue only shows up on that specific test and is intermittent as
> you noticed.

On this particular test it seems to be frequent.

> To be honest, that was quite painful to debug and I'm a bit scared to
> jump back in. I think I had the marionette produce some lsof reports
> back then, or something like that. I very much regret not to have kept
> notes somewhere.

Yeah.  One possibility is timing: (restart-service 'guix-daemon) kills
the daemon’s process group, waits for the group leader to terminate,
then starts the daemon.  I think there’s a possibility that other
processes in the group (like a ‘guix substitute’ child) are still alive
at that point and they might be the one keeping the device busy.
Sleeping for a couple of seconds should allow us to kinda verify that
hypothesis.  However, the fact that it’s only in the cryptsetup case
suggests this hypothesis might well be completely bogus.

But yeah, invoking ‘lsof’ would tell us.

I guess we’ll leave that as future work though.

Thanks for your feedback!

Ludo’.






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
  2022-12-09 22:32   ` Ludovic Courtès
@ 2022-12-10 15:35     ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2022-12-10 15:35 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 59884-done

Ludovic Courtès <ludo@gnu.org> skribis:

> Yeah.  One possibility is timing: (restart-service 'guix-daemon) kills
> the daemon’s process group, waits for the group leader to terminate,
> then starts the daemon.  I think there’s a possibility that other
> processes in the group (like a ‘guix substitute’ child) are still alive
> at that point and they might be the one keeping the device busy.
> Sleeping for a couple of seconds should allow us to kinda verify that
> hypothesis.

For good measure, I threw a retry loop in commit
61b7e9687757aff013b99e4ab15669a950c8b222.  The test passes on my laptop
and it passed on ci.guix as well:

  https://ci.guix.gnu.org/eval/31297

I’m considering the case closed and now getting ready for 1.4.0rc2.

Ludo’.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-10 15:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 13:50 bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures Ludovic Courtès
2022-12-08 18:02 ` Mathieu Othacehe
2022-12-09 22:32   ` Ludovic Courtès
2022-12-10 15:35     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).