unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
@ 2022-12-02 17:52 pelzflorian (Florian Pelz)
  2022-12-09  9:42 ` Ludovic Courtès
  2022-12-12 12:07 ` pelzflorian (Florian Pelz)
  0 siblings, 2 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-02 17:52 UTC (permalink / raw)
  To: 59784

I aborted graphical system installation (Ctrl-C), retried the
installation and got this:

shepherd: Service guix-daemon has been stopped.
shepherd: Service guix-daemon has been started.
guix system: Fehler: opening file `/gnu/store/4z81a7njyvnwa4kn46ad6vhvi0lcnrhh-shadow-4.9.drv': No such file or directory
Befehl ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") hat mit Exit-Code 1 geendet
Drücken Sie die Eingabetaste, um fortzufahren.

(It told me to press Enter to continue.)  I did so; retried; but again
it did not really retry the installation, I always get this same error
message.

Sorry in case this is a duplicate bug.

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-02 17:52 bug#59784: [version 1.4.0rc1] Retrying a failed install fails pelzflorian (Florian Pelz)
@ 2022-12-09  9:42 ` Ludovic Courtès
  2022-12-09 11:11   ` Ludovic Courtès
  2022-12-10  8:39   ` pelzflorian (Florian Pelz)
  2022-12-12 12:07 ` pelzflorian (Florian Pelz)
  1 sibling, 2 replies; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-09  9:42 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 59784

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> I aborted graphical system installation (Ctrl-C), retried the
> installation and got this:
>
> shepherd: Service guix-daemon has been stopped.
> shepherd: Service guix-daemon has been started.
> guix system: Fehler: opening file `/gnu/store/4z81a7njyvnwa4kn46ad6vhvi0lcnrhh-shadow-4.9.drv': No such file or directory
> Befehl ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") hat mit Exit-Code 1 geendet
> Drücken Sie die Eingabetaste, um fortzufahren.
>
> (It told me to press Enter to continue.)  I did so; retried; but again
> it did not really retry the installation, I always get this same error
> message.

Related to that, I found this old bug:

  https://issues.guix.gnu.org/35543

I tried to reproduce it:

  0. I chose a basic installation to a fully-encrypted disk with a
     single partition.

  1. I hit Ctrl-C while ‘guix system init’ was downloading substitutes.

  2. That led me to a confusing error screen says “Command cryptsetup
     failed” with Ignore/Abort/Retry buttons.  This should have been
     “Command guix system init” failed no?

  3. I resumed starting with the “Configuration File” step, and there
     ‘guix system init’ ran to completion just fine.

Maybe the difference is that you hit Ctrl-C when ‘guix system init’ had
already started copying stuff to /mnt?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-09  9:42 ` Ludovic Courtès
@ 2022-12-09 11:11   ` Ludovic Courtès
  2022-12-10  8:39   ` pelzflorian (Florian Pelz)
  1 sibling, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-09 11:11 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 59784

Ludovic Courtès <ludo@gnu.org> skribis:

>   2. That led me to a confusing error screen says “Command cryptsetup
>      failed” with Ignore/Abort/Retry buttons.

Actually it’s “External command ("cryptsetup" "close" "cryptroot")
exited with code 5” and “cryptroot device is busy”.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-09  9:42 ` Ludovic Courtès
  2022-12-09 11:11   ` Ludovic Courtès
@ 2022-12-10  8:39   ` pelzflorian (Florian Pelz)
  2022-12-13  9:40     ` Ludovic Courtès
  1 sibling, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-10  8:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 59784

Ludovic Courtès <ludo@gnu.org> writes:
> I tried to reproduce it:
>
>   0. I chose a basic installation to a fully-encrypted disk with a
>      single partition.
>
>   1. I hit Ctrl-C while ‘guix system init’ was downloading substitutes.
>
>   2. That led me to a confusing error screen says “Command cryptsetup
>      failed” with Ignore/Abort/Retry buttons.  This should have been
>      “Command guix system init” failed no?
>
>   3. I resumed starting with the “Configuration File” step, and there
>      ‘guix system init’ ran to completion just fine.

Yes, these were the steps, except I did not do encryption.  But I had
not told the whole story …  Sorry!

So what was missing is that the reason I pressed Ctrl-C was a rare
dropout by my Ethernet controller.  Because it is so rare and has not
happened anymore since, as a substitute, for reproducing, I did as
follows:

 0. Use Ethernet for the installation.

 1. During substitute downloading, pull the Ethernet plug.

 2. Get lucky so the installation will crash with an error and not just
    pause.  Otherwise, if no crash, repeat.

 3. Press Ctrl-C.

 4. Resume the installation from the last step.

 5. It will fail now.

I now uploaded an installer-dump-bade9971 of me reproducing the issue.

> Maybe the difference is that you hit Ctrl-C when ‘guix system init’ had
> already started copying stuff to /mnt?

No, like you, I was in the substitute downloading step.

This issue is much rarer than I thought.

Thank you for investigating.

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-02 17:52 bug#59784: [version 1.4.0rc1] Retrying a failed install fails pelzflorian (Florian Pelz)
  2022-12-09  9:42 ` Ludovic Courtès
@ 2022-12-12 12:07 ` pelzflorian (Florian Pelz)
  1 sibling, 0 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-12 12:07 UTC (permalink / raw)
  To: 59784

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:
> shepherd: Service guix-daemon has been stopped.
> shepherd: Service guix-daemon has been started.
> guix system: Fehler: opening file
> `/gnu/store/4z81a7njyvnwa4kn46ad6vhvi0lcnrhh-shadow-4.9.drv': No such
> file or directory
> Befehl ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") hat mit Exit-Code 1 geendet

Still happens with 1.4.0rc2.  I guess install-system in
gnu/installer/final.scm does not sync the disk on failure?

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-10  8:39   ` pelzflorian (Florian Pelz)
@ 2022-12-13  9:40     ` Ludovic Courtès
  2022-12-13  9:48       ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-13  9:40 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> I now uploaded an installer-dump-bade9971 of me reproducing the issue.

Here’s the relevant syslog excerpt (this was with 1.4.0rc1) where we can
see the point where you unplugged the Ethernet connection:

--8<---------------cut here---------------start------------->8---
Dec 10 09:07:29 localhost installer[399]: running command ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") 
Dec 10 09:07:48 localhost installer[399]: ^[[1m10.3 MB will be downloaded^M 
Dec 10 09:07:49 localhost installer[399]: ^[[0m^M^[[K^M^[[K utf8proc-2.5.0  52KiB                716KiB/s 00:00 [##################] 100.0%^M^[[K utf8proc-2.5.0  52KiB                594KiB/s 00:00 [##################] 100.0%^M 

[...]

Dec 10 09:08:48 localhost installer[399]: ^[[0m^M^[[Kretrying download of '/gnu/store/8zigz7afvz2rjrvrh7zq1d389qbl2684-dbus-1.12.20' with other substitute URLs...^M 
Dec 10 09:08:48 localhost installer[399]: guix substitute: warning: bordeaux.guix.gnu.org: host not found: Name or service not known^M 
Dec 10 09:08:48 localhost installer[399]: guix substitute: error: failed to find alternative substitute for '/gnu/store/8zigz7afvz2rjrvrh7zq1d389qbl2684-dbus-1.12.20'^M 
Dec 10 09:08:48 localhost installer[399]: ^[[31;1msubstitution of /gnu/store/8zigz7afvz2rjrvrh7zq1d389qbl2684-dbus-1.12.20 failed^[[0m^M 
Dec 10 09:08:49 localhost installer[399]: ^M^[[K^M^[[Kretrying download of '/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash' with other substitute URLs...^M 
Dec 10 09:08:49 localhost installer[399]: guix substitute: warning: bordeaux.guix.gnu.org: host not found: Name or service not known^M 
Dec 10 09:08:49 localhost installer[399]: guix substitute: error: failed to find alternative substitute for '/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash'^M 
Dec 10 09:08:49 localhost installer[399]: ^[[31;1msubstitution of /gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash failed^[[0m^M 
Dec 10 09:08:49 localhost installer[399]: guix system: ^[[1;31merror: ^[[0mcorrupt input while restoring archive from #<closed: file 7fa02f84d4d0>^M 
Dec 10 09:08:49 localhost installer[399]: command ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") exited with value 1 
Dec 10 09:08:58 localhost vmunix: [ 1220.571986] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control off

[...]

Dec 10 09:09:12 localhost shepherd[1]: Service guix-daemon has been stopped. 
Dec 10 09:09:12 localhost shepherd[1]: Service guix-daemon has been started. 
Dec 10 09:09:17 localhost installer[274]: unmounting "/mnt/" 
Dec 10 09:09:17 localhost vmunix: [ 1239.111442] EXT4-fs (sda3): unmounting filesystem.
Dec 10 09:09:19 localhost installer[274]: running form #<newt-form 2499c90> ("Installation menu") with 0 clients 
Dec 10 09:09:22 localhost installer[274]: running step 'final' 
Dec 10 09:09:22 localhost installer[274]: proceeding with final step 
Dec 10 09:09:23 localhost installer[274]: mounting "/dev/sda3" on "/mnt/" 
Dec 10 09:09:23 localhost vmunix: [ 1245.890840] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota mode: none.
Dec 10 09:09:23 localhost vmunix: [ 1245.893304] Adding 3905532k swap on /dev/sda2.  Priority:-2 extents:1 across:3905532k SSFS
Dec 10 09:09:23 localhost installer[274]: running form #<newt-form 248c440> ("Configuration file") with 0 clients 
Dec 10 09:09:29 localhost installer[437]: install supported locale en_US.utf8. 
Dec 10 09:09:29 localhost shepherd[1]: Service guix-daemon has been stopped. 
Dec 10 09:09:29 localhost shepherd[1]: Service guix-daemon has been started. 
Dec 10 09:09:29 localhost installer[437]: running command ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") 
Dec 10 09:09:54 localhost installer[437]: ^[[1m60.8 MB will be downloaded^M 
Dec 10 09:09:54 localhost installer[437]: ^[[0mguix system: ^[[1;31merror: ^[[0mopening file `/gnu/store/igxf1b1l2b19h7mx2s6r117270dbi6iq-guix-1.4.0rc1.drv': No such file or directory^M 
Dec 10 09:09:54 localhost installer[437]: command ("guix" "system" "init" "--fallback" "/mnt/etc/config.scm" "/mnt") exited with value 1 
Dec 10 09:10:21 localhost shepherd[1]: Service guix-daemon has been stopped. 
Dec 10 09:10:21 localhost shepherd[1]: Service guix-daemon has been started. 
Dec 10 09:10:21 localhost installer[274]: unmounting "/mnt/" 
Dec 10 09:10:21 localhost vmunix: [ 1303.398583] EXT4-fs (sda3): unmounting filesystem.
Dec 10 09:10:28 localhost installer[274]: crashing due to uncaught exception: %exception (#<&user-abort-error>) 
--8<---------------cut here---------------end--------------->8---

It looks like the store is in a broken state, with its database not
matching its actual contents.  The ‘install-system’ procedure is
supposed to protect against that by making a backup of the database
before starting the installation and restoring it afterwards.  (It
apparently worked for me when I interrupted ‘guix system init’ by
hitting C-c.)

I wonder how that failed here.  Mathieu, ideas?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-13  9:40     ` Ludovic Courtès
@ 2022-12-13  9:48       ` Ludovic Courtès
  2022-12-13 22:22         ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-13  9:48 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

Hi again,

Ludovic Courtès <ludo@gnu.org> skribis:

> It looks like the store is in a broken state, with its database not
> matching its actual contents.  The ‘install-system’ procedure is
> supposed to protect against that by making a backup of the database
> before starting the installation and restoring it afterwards.  (It
> apparently worked for me when I interrupted ‘guix system init’ by
> hitting C-c.)

Actually, look at the excerpt from final.scm:

         ;; Restart guix-daemon so that it does no keep the MNT namespace
         ;; alive.
         (restart-service 'guix-daemon)
         (copy-file saved-database database-file)

We’re restarting the daemon *before* we have restored the database,
which is wrong: depending on how lucky you are, guix-daemon might load
the old database (all this depends on what exactly happens when sqlite
opens the database, but I think there’s a possibility that it will load
or cache a few things and thus fail to see the changes ‘copy-file’
introduces.)

So my guess is that things will be much better if we swap these two
lines.

Florian, it would be great if you could try that and run a new image
generated version ‘version-1.4.0’ with these two lines changed.  To
produce the image, run:

  ./pre-inst-env guix system image -t iso9660 --label=Guix \
    gnu/system/install.scm

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-13  9:48       ` Ludovic Courtès
@ 2022-12-13 22:22         ` pelzflorian (Florian Pelz)
  2022-12-13 23:16           ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-13 22:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

Hi again.

Ludovic Courtès <ludo@gnu.org> writes:
> So my guess is that things will be much better if we swap these two
> lines.

This was helpful, but not enough.

Swapping them may have improved the likelihood of being able to retry,
but the issue is still there.  I uploaded as installer-dump-5f9f8dbe,
but it is pretty much the same as the previous dump.

Tomorrow, I will try to add an fsync call in between the two lines.

>   ./pre-inst-env guix system image -t iso9660 --label=Guix \
>     gnu/system/install.scm

Additionally, I had to do “GUIX_ALLOW_ME_TO_USE_PRIVATE_COMMIT=y
make update-guix-package”.  Or else the installer was using a Guix that
did not have the lines swapped.

Also before I did the GPG authorization dance (my x86 machine isn’t
worth getting my actual commiter GPG keys, so I make sure its dummy GPG
key is in the keyring branch, .guix-authorizations file, that
guix/channels.scm’s default guix channel points to the url
/home/florian/src/guix and to the commit with the new authorization).
Then I guix pulled.  So that building the installer succeeds.  I did
*not* use ./pre-inst-env.

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-13 22:22         ` pelzflorian (Florian Pelz)
@ 2022-12-13 23:16           ` Ludovic Courtès
  2022-12-14 13:36             ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-13 23:16 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> So my guess is that things will be much better if we swap these two
>> lines.
>
> This was helpful, but not enough.

Sorry, I think I wasn’t thinking at full speed.  There needs to be zero
daemons running while we copy the database.  So the real fix is more
like this:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 864 bytes --]

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 044f79372b..9a6bbad122 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -213,10 +213,13 @@ (define (assert-exit x)
 
              (set! ret (run-command install-command #:tty? #t)))
            (lambda ()
-             ;; Restart guix-daemon so that it does no keep the MNT namespace
+             ;; Stop guix-daemon so that it does no keep the MNT namespace
              ;; alive.
-             (restart-service 'guix-daemon)
+             (stop-service 'guix-daemon)
+
+             ;; Restore the database and restart it.
              (copy-file saved-database database-file)
+             (start-service 'guix-daemon)
 
              ;; Finally umount the cow-store and exit the container.
              (unmount-cow-store (%installer-target-dir) backing-directory)

[-- Attachment #3: Type: text/plain, Size: 910 bytes --]


>>   ./pre-inst-env guix system image -t iso9660 --label=Guix \
>>     gnu/system/install.scm
>
> Additionally, I had to do “GUIX_ALLOW_ME_TO_USE_PRIVATE_COMMIT=y
> make update-guix-package”.  Or else the installer was using a Guix that
> did not have the lines swapped.

Hmm this is surprising because we’re already using (current-guix) in
(gnu installer).

> Also before I did the GPG authorization dance (my x86 machine isn’t
> worth getting my actual commiter GPG keys, so I make sure its dummy GPG
> key is in the keyring branch, .guix-authorizations file, that
> guix/channels.scm’s default guix channel points to the url
> /home/florian/src/guix and to the commit with the new authorization).
> Then I guix pulled.  So that building the installer succeeds.  I did
> *not* use ./pre-inst-env.

Ah yes, apologies.  You should be able to disable authentication with
this:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Type: text/x-patch, Size: 535 bytes --]

diff --git a/gnu/packages/package-management.scm b/gnu/packages/package-management.scm
index 5a09b1fcf8..374b187d8c 100644
--- a/gnu/packages/package-management.scm
+++ b/gnu/packages/package-management.scm
@@ -625,6 +625,7 @@ (define-public current-guix-package
                (inherit guix)
                (source source)
                (build-system channel-build-system)
+               (arguments '(#:authenticate? #f))
                (inputs '())
                (native-inputs '())
                (propagated-inputs '())))

[-- Attachment #5: Type: text/plain, Size: 73 bytes --]


Thanks a lot for patiently testing, this is very helpful!

Ludo’.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-13 23:16           ` Ludovic Courtès
@ 2022-12-14 13:36             ` pelzflorian (Florian Pelz)
  2022-12-14 21:47               ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-14 13:36 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

Eventual success, partially.

First of all:

Ludovic Courtès <ludo@gnu.org> writes:
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> Additionally, I had to do “GUIX_ALLOW_ME_TO_USE_PRIVATE_COMMIT=y
>> make update-guix-package”.  Or else the installer was using a Guix that
>> did not have the lines swapped.
> Hmm this is surprising because we’re already using (current-guix) in
> (gnu installer).

Apparently no.  If I commit only those two diffs from your mail, with
`./pre-inst-env guix system image -t iso9660 --label=Guix
gnu/system/install.scm`, then

guix gc --references /gnu/store/*-installer-real

prints a Guix package that does not contain any of the changes to
gnu/installer/final.scm.

Nonetheless I used it and ran the installer with surprising failures
that make me doubt either the health of my USB drive: `guix system
init --fallback` did not download substitutes but said ACL seems to be
uninitialized and fell back to downloading/building the tar.xz
sources.  I pulled the Ethernet plug, resumed the installer to run
`guix system init` again, but this now complains that nss-certs is an
unknown package.  Sending a dump crashed the installer.  On TTY3, `ls
/tmp` tells me '-bash: ls: command not found'.

Another USB drive, another try, the installer again says there's no
ACL and downloads tar.xz, but otherwise behaves as rc2 and sometimes
bugs out when pulling Ethernet; final.scm does not contain the patch.

Is that second diff of yours perhaps really about ACLs?

I do the authorization dance, commit the diff about 'stop-service' and
the update-guix-package, then pull --branch=version-1.4.0.  I can now
resume happily, when pulling the Ethernet and even when pressing
Ctrl-C just for fun.

Except it is necessary to resume twice.  The first resume always fails
and the second resume resumes.  Does it confuse the two databases?

Except after a large number of resumes, not even the second resume
resumes anymore.  I sent a installer-dump-c82c7abf.

I shall try with fsync now.

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-14 13:36             ` pelzflorian (Florian Pelz)
@ 2022-12-14 21:47               ` pelzflorian (Florian Pelz)
  2022-12-14 23:50                 ` Ludovic Courtès
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-14 21:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

[-- Attachment #1: Type: text/plain, Size: 194 bytes --]

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:
> I shall try with fsync now.

fsyncing the database had no effect.  (In addition to Ludo’s
'stop-service', I had done


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fsync.patch --]
[-- Type: text/x-patch, Size: 773 bytes --]

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index ef487805f0..13deffef85 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -217,8 +217,16 @@ (define (assert-exit x)
              ;; alive.
              (stop-service 'guix-daemon)
 
-             ;; Restore the database and restart it.
+             ;; Restore the database.
              (copy-file saved-database database-file)
+
+             ;; Sync it to the filesystem.
+             (let* ((flags O_RDONLY)
+                    (fd (open database-file flags)))
+               (fsync fd)
+               (close fd))
+
+             ;; And restart guix-daemon.
              (start-service 'guix-daemon)
 
              ;; Finally umount the cow-store and exit the container.


[-- Attachment #3: Type: text/plain, Size: 348 bytes --]


The same two problems:

* If I resume a crashed installer, I need to resume twice because the
  first resume always fails immediately.

* With bad luck, it permanently fails, even a second, third, fourth,
  fifth time fail.

This is the same as without the fsync.  Fsync had no effect.  Still I
uploaded installer-dump-194618fa.

Regards,
Florian

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-14 21:47               ` pelzflorian (Florian Pelz)
@ 2022-12-14 23:50                 ` Ludovic Courtès
  2022-12-15 17:46                   ` pelzflorian (Florian Pelz)
  2022-12-16 13:55                 ` Maxime Devos
  2022-12-18 16:41                 ` pelzflorian (Florian Pelz)
  2 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-14 23:50 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

Grrr, I’m really silly: we have the same problem (copying the database
before the daemon has been stopped) just a few lines above.

How about this:


[-- Attachment #2: Type: text/x-patch, Size: 2139 bytes --]

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 044f79372b..360b34d8cb 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2018, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
-;;; Copyright © 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2019, 2020, 2022 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -196,14 +196,15 @@ (define (assert-exit x)
              ;; the loaded cow-store locale files will prevent umounting.
              (install-locale locale)
 
-             ;; Save the database, so that it can be restored once the
-             ;; cow-store is umounted.
+             ;; Stop the daemon and save the database, so that it can be
+             ;; restored once the cow-store is umounted.
+             (stop-service 'guix-daemon)
              (copy-file database-file saved-database)
+
              (mount-cow-store (%installer-target-dir) backing-directory))
            (lambda ()
              ;; We need to drag the guix-daemon to the container MNT
              ;; namespace, so that it can operate on the cow-store.
-             (stop-service 'guix-daemon)
              (start-service 'guix-daemon (list (number->string (getpid))))
 
              (setvbuf (current-output-port) 'none)
@@ -213,10 +214,13 @@ (define (assert-exit x)
 
              (set! ret (run-command install-command #:tty? #t)))
            (lambda ()
-             ;; Restart guix-daemon so that it does no keep the MNT namespace
+             ;; Stop guix-daemon so that it does no keep the MNT namespace
              ;; alive.
-             (restart-service 'guix-daemon)
+             (stop-service 'guix-daemon)
+
+             ;; Restore the database and restart it.
              (copy-file saved-database database-file)
+             (start-service 'guix-daemon)
 
              ;; Finally umount the cow-store and exit the container.
              (unmount-cow-store (%installer-target-dir) backing-directory)

[-- Attachment #3: Type: text/plain, Size: 131 bytes --]


?

This time, I believe we only ever copy the database when we’re sure no
guix-daemon process is accessing it.

Ludo’.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-14 23:50                 ` Ludovic Courtès
@ 2022-12-15 17:46                   ` pelzflorian (Florian Pelz)
  2022-12-15 20:44                     ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-15 17:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

Hi Ludo…

Ludovic Courtès <ludo@gnu.org> writes:
> This time, I believe we only ever copy the database when we’re sure no
> guix-daemon process is accessing it.

Failure.  In addition to your partially helpful patch from before
(with which a second resume now works most of the time), I now tried
further the new change:

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 044f79372b..360b34d8cb 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -196,14 +196,15 @@ (define (assert-exit x)
              ;; the loaded cow-store locale files will prevent umounting.
              (install-locale locale)

-             ;; Save the database, so that it can be restored once the
-             ;; cow-store is umounted.
+             ;; Stop the daemon and save the database, so that it can be
+             ;; restored once the cow-store is umounted.
+             (stop-service 'guix-daemon)
              (copy-file database-file saved-database)
+
              (mount-cow-store (%installer-target-dir) backing-directory))
            (lambda ()
              ;; We need to drag the guix-daemon to the container MNT
              ;; namespace, so that it can operate on the cow-store.
-             (stop-service 'guix-daemon)
              (start-service 'guix-daemon (list (number->string (getpid))))

              (setvbuf (current-output-port) 'none)


No additional effect. :(  Perhaps at that time, the guix-daemon isnt
doing anything anyway (though the addition makes sense in general and
may help some users).  There are the same two problems, needing to
resume twice each time and eventually not being able to resume at all
(perhaps some multi-core issue?).  I sent installer-dump-89be04d5.

I tried interrupting the Ethernet on the same machine but with an
installed 1.4.0rc2 Guix System during `guix system reconfigure`.
This has no issues…  There must be corruption in the installer.

Regards,
Florian




^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-15 17:46                   ` pelzflorian (Florian Pelz)
@ 2022-12-15 20:44                     ` pelzflorian (Florian Pelz)
  2022-12-16 16:57                       ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-15 20:44 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

Desperately I tried also adding fsync, to no avail, both issues remain.
Non-working patch attached.

Maybe dynamic-wind is an inappropriate pattern here?

If I interrupt installation using Ctrl-C (which I normally don’t,
instead I unplug Ethernet), then I have to press Ctrl-C twice.  Maybe
that could be related to why I need to resume twice?

I’m in the dark.

Regards,
Florian


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fsync-to-no-avail.patch --]
[-- Type: text/x-patch, Size: 1293 bytes --]

does not help

diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
index 5f720f6641..f5935a29c9 100644
--- a/gnu/installer/final.scm
+++ b/gnu/installer/final.scm
@@ -201,6 +201,12 @@ (define (assert-exit x)
              (stop-service 'guix-daemon)
              (copy-file database-file saved-database)
 
+             ;; Sync it to the filesystem.
+             (let* ((flags O_RDONLY)
+                    (fd (open saved-database flags)))
+               (fsync fd)
+               (close fd))
+
              (mount-cow-store (%installer-target-dir) backing-directory))
            (lambda ()
              ;; We need to drag the guix-daemon to the container MNT
@@ -218,8 +224,16 @@ (define (assert-exit x)
              ;; alive.
              (stop-service 'guix-daemon)
 
-             ;; Restore the database and restart it.
+             ;; Restore the database.
              (copy-file saved-database database-file)
+
+             ;; Sync it to the filesystem.
+             (let* ((flags O_RDONLY)
+                    (fd (open database-file flags)))
+               (fsync fd)
+               (close fd))
+
+             ;; And restart guix-daemon.
              (start-service 'guix-daemon)
 
              ;; Finally umount the cow-store and exit the container.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-14 21:47               ` pelzflorian (Florian Pelz)
  2022-12-14 23:50                 ` Ludovic Courtès
@ 2022-12-16 13:55                 ` Maxime Devos
  2022-12-16 20:17                   ` pelzflorian (Florian Pelz)
  2022-12-18 16:41                 ` pelzflorian (Florian Pelz)
  2 siblings, 1 reply; 25+ messages in thread
From: Maxime Devos @ 2022-12-16 13:55 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz), Ludovic Courtès; +Cc: Mathieu Othacehe, 59784


[-- Attachment #1.1.1: Type: text/plain, Size: 1254 bytes --]



On 14-12-2022 22:47, pelzflorian (Florian Pelz) wrote:
> fsyncing the database had no effect.  (In addition to Ludo’s
> 'stop-service', I had done
> 
> 
> fsync.patch
> 
> diff --git a/gnu/installer/final.scm b/gnu/installer/final.scm
> index ef487805f0..13deffef85 100644
> --- a/gnu/installer/final.scm
> +++ b/gnu/installer/final.scm
> @@ -217,8 +217,16 @@ (define (assert-exit x)
>                ;; alive.
>                (stop-service 'guix-daemon)
>   
> -             ;; Restore the database and restart it.
> +             ;; Restore the database.
>                (copy-file saved-database database-file)
> +
> +             ;; Sync it to the filesystem.
> +             (let* ((flags O_RDONLY)
> +                    (fd (open database-file flags)))
> +               (fsync fd)
> +               (close fd))
> +

So, I'm nominally 'on hiatus', but I noticed this mail, and noticed you 
copied a file (and fsync'ed it), but forgot to fsync the directory it 
was copied to -- from what I've read (but I don't recall the source), 
fsyncing the contents of the file isn't enough, you also need to fsync 
the directory such that the new file entry is in the directory after 
crashing.

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-15 20:44                     ` pelzflorian (Florian Pelz)
@ 2022-12-16 16:57                       ` Ludovic Courtès
  2022-12-16 20:28                         ` pelzflorian (Florian Pelz)
  2022-12-17 16:15                         ` Ludovic Courtès
  0 siblings, 2 replies; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-16 16:57 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> Desperately I tried also adding fsync, to no avail, both issues remain.
> Non-working patch attached.
>
> Maybe dynamic-wind is an inappropriate pattern here?
>
> If I interrupt installation using Ctrl-C (which I normally don’t,
> instead I unplug Ethernet), then I have to press Ctrl-C twice.  Maybe
> that could be related to why I need to resume twice?

One finding: when hitting C-c, the dynamic-wind exit handler (the one
that restores the database and umounts the cow store) is *not* executed.

This is because ‘call-with-mnt-container’ sets a SIGINT handler that
terminates that process with SIGKILL (I’m not entirely sure of the
rationale, but said process cannot handle signals in Scheme while it’s
in ‘waitpid’, called from ‘run-command’).

I did reproduce the issue in a VM by running “ifconfig ens3 down” in a
tty, or by killing the ‘guix substitute’ process, to cause failure of
‘guix system init’.  In that case the database is indeed restored, but I
occasionally get errors like “/gnu/store/….drv: No such file or
directory”.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-16 13:55                 ` Maxime Devos
@ 2022-12-16 20:17                   ` pelzflorian (Florian Pelz)
  0 siblings, 0 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-16 20:17 UTC (permalink / raw)
  To: Maxime Devos; +Cc: Mathieu Othacehe, Ludovic Courtès, 59784

Maxime Devos <maximedevos@telenet.be> writes:
> So, I'm nominally 'on hiatus', but I noticed this mail, and noticed
> you copied a file (and fsync'ed it), but forgot to fsync the directory
> it was copied to -- from what I've read (but I don't recall the
> source), fsyncing the contents of the file isn't enough, you also need
> to fsync the directory such that the new file entry is in the
> directory after crashing.

Ohh indeed!  The Linux manpage on fsync confirms it.  That invalidates
my fsync testing.  Which was on a codepath that, as Ludo found out, did
not even run.  But I will remember to fsync the directory in the future.

Thank you very much Maxime!

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-16 16:57                       ` Ludovic Courtès
@ 2022-12-16 20:28                         ` pelzflorian (Florian Pelz)
  2022-12-17 11:01                           ` Ludovic Courtès
  2022-12-17 16:15                         ` Ludovic Courtès
  1 sibling, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-16 20:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

Ludovic Courtès <ludo@gnu.org> writes:
> One finding: when hitting C-c, the dynamic-wind exit handler (the one
> that restores the database and umounts the cow store) is *not* executed.

Impressive findings.

Now that you found the dynamic-wind’s out-guard does not even run: Uhh I
had misdiagnosed when I thought your 'stop-service' patch had made a
difference and caused a second resume to work.  Second resume was
already possible on rc2.  Except eventually resume stops working and on
some install attempts with rc2, resume stops working right away.

After seeing that you opened a bug#60116 on setsid(), I tested removing
the setsid call and it had no effect, but if the dynamic-wind’s
out-guard does not even run, that is to be expected.


> I did reproduce the issue in a VM by running “ifconfig ens3 down” in a
> tty, or by killing the ‘guix substitute’ process, to cause failure of
> ‘guix system init’.  In that case the database is indeed restored, but I
> occasionally get errors like “/gnu/store/….drv: No such file or
> directory”.

Yes, this is the error message that I get on failing resumes.

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-16 20:28                         ` pelzflorian (Florian Pelz)
@ 2022-12-17 11:01                           ` Ludovic Courtès
  2022-12-17 19:36                             ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-17 11:01 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

Moin!

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> One finding: when hitting C-c, the dynamic-wind exit handler (the one
>> that restores the database and umounts the cow store) is *not* executed.
>
> Impressive findings.
>
> Now that you found the dynamic-wind’s out-guard does not even run:

It does not run on C-c, but it does run in other cases, typically if you
just press Enter after reading the message that says “command failed,
press Enter”.

I don’t see how to address the C-c issue so we’ll have to live with it.

Longer-term we may have to find a different strategy than the
‘call-with-mnt-container’ trick, but that’s difficult.

> After seeing that you opened a bug#60116 on setsid(), I tested removing
> the setsid call and it had no effect, but if the dynamic-wind’s
> out-guard does not even run, that is to be expected.

Right; #60116 is related, and it’s not great but it’s not critical.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-16 16:57                       ` Ludovic Courtès
  2022-12-16 20:28                         ` pelzflorian (Florian Pelz)
@ 2022-12-17 16:15                         ` Ludovic Courtès
  2022-12-17 19:27                           ` pelzflorian (Florian Pelz)
  1 sibling, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-17 16:15 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

Ludovic Courtès <ludo@gnu.org> skribis:

> I did reproduce the issue in a VM by running “ifconfig ens3 down” in a
> tty, or by killing the ‘guix substitute’ process, to cause failure of
> ‘guix system init’.  In that case the database is indeed restored, but I
> occasionally get errors like “/gnu/store/….drv: No such file or
> directory”.

The error message that’s haunting us:

  opening file `/gnu/store/….drv': No such file or directory

comes from guix-daemon.  It happens while the client is doing an
‘add-text-to-store’ RPC to add that .drv to the store.
‘LocalStore::addTextToStore’ supposedly creates the .drv file in
/gnu/store and then reads it back (‘registerValidPath’ -> ‘addValidPath’
-> ‘readDerivation’ -> ‘readFile’): this is where it gets ENOENT.

It would suggest that the database is consistent, but that somehow
writes don’t go through the overlay FS.

More investigation is needed, but we may have to live with this bug in
1.4.0.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-17 16:15                         ` Ludovic Courtès
@ 2022-12-17 19:27                           ` pelzflorian (Florian Pelz)
  2022-12-17 21:30                             ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-17 19:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:
> The error message that’s haunting us:
>
>   opening file `/gnu/store/….drv': No such file or directory
>
> comes from guix-daemon.  It happens while the client is doing an
> ‘add-text-to-store’ RPC to add that .drv to the store.
> ‘LocalStore::addTextToStore’ supposedly creates the .drv file in
> /gnu/store and then reads it back (‘registerValidPath’ -> ‘addValidPath’
> -> ‘readDerivation’ -> ‘readFile’): this is where it gets ENOENT.
>
> It would suggest that the database is consistent, but that somehow
> writes don’t go through the overlay FS.

Most interesting.

I saw a comment
> void LocalStore::registerValidPaths(const ValidPathInfos & infos)
> {
>     /* SQLite will fsync by default, but the new valid paths may not be fsync-ed.
>      * So some may want to fsync them before registering the validity, at the
>      * expense of some speed of the path registering operation. */
>     if (settings.syncBeforeRegistering) sync();

In vain, I therefore tried


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sync-before-registering.patch --]
[-- Type: text/x-patch, Size: 444 bytes --]

diff --git a/nix/libstore/globals.cc b/nix/libstore/globals.cc
index d4f9a46a74..5f8a3a3031 100644
--- a/nix/libstore/globals.cc
+++ b/nix/libstore/globals.cc
@@ -40,7 +40,7 @@ Settings::Settings()
     reservedSize = 8 * 1024 * 1024;
     fsyncMetadata = true;
     useSQLiteWAL = true;
-    syncBeforeRegistering = false;
+    syncBeforeRegistering = true;
     useSubstitutes = true;
     useChroot = false;
     impersonateLinux26 = false;

[-- Attachment #3: Type: text/plain, Size: 43 bytes --]


But it changes nothing.

Regards,
Florian

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-17 11:01                           ` Ludovic Courtès
@ 2022-12-17 19:36                             ` pelzflorian (Florian Pelz)
  0 siblings, 0 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-17 19:36 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784

Ahoi. :)

Ludovic Courtès <ludo@gnu.org> writes:
>> Now that you found the dynamic-wind’s out-guard does not even run:
> It does not run on C-c, but it does run in other cases, typically if you
> just press Enter after reading the message that says “command failed,
> press Enter”.

Ahh.  Then would it be good if you at least pushed the partial fix about
replacing 'restart' with 'stop-service'?  I’m unsure now if it has an
effect on the likelihood that a second resume works again.  But maybe it
does.  And is closer to correct.


> I don’t see how to address the C-c issue so we’ll have to live with it.

Yes.  Thank you for all investigations!

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-17 19:27                           ` pelzflorian (Florian Pelz)
@ 2022-12-17 21:30                             ` Ludovic Courtès
  2022-12-18  0:23                               ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-17 21:30 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> I saw a comment
>> void LocalStore::registerValidPaths(const ValidPathInfos & infos)
>> {
>>     /* SQLite will fsync by default, but the new valid paths may not be fsync-ed.
>>      * So some may want to fsync them before registering the validity, at the
>>      * expense of some speed of the path registering operation. */
>>     if (settings.syncBeforeRegistering) sync();
>
> In vain, I therefore tried

Yeah, I don’t think this has much to do with syncing data on disk.  It’s
an inconsistency between the store database and the actual store.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-17 21:30                             ` Ludovic Courtès
@ 2022-12-18  0:23                               ` Ludovic Courtès
  0 siblings, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2022-12-18  0:23 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: Mathieu Othacehe, 59784

After spending a few more hours on this, I got convinced that upon
restarting guix-daemon, even though we had restored
/var/guix/db/db.sqlite, the presence of stale db.sqlite-{wal,shm} files
could lead sqlite to do as if transactions in the WAL file had been
committed.

Commit 495c50008be91429ebea3805e161a1e385a2a572 deletes these two
files, and it appears to solve the problem for me.

I also pushed the patch previously shared in this thread, to make sure
db.sqlite is only copied when guix-daemon is stopped.

So we have this:

  495c50008b installer: final: Delete SQLite WAL and shm files upon completion.
  9b6703eabe installer: final: Stop guix-daemon before accessing store database.

I’ll go ahead and prepare for the release as planned, to be published on Monday.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#59784: [version 1.4.0rc1] Retrying a failed install fails
  2022-12-14 21:47               ` pelzflorian (Florian Pelz)
  2022-12-14 23:50                 ` Ludovic Courtès
  2022-12-16 13:55                 ` Maxime Devos
@ 2022-12-18 16:41                 ` pelzflorian (Florian Pelz)
  2 siblings, 0 replies; 25+ messages in thread
From: pelzflorian (Florian Pelz) @ 2022-12-18 16:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 59784-done

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:
> * If I resume a crashed installer, I need to resume twice because the
>   first resume always fails immediately.

Hooray, you fixed it.  Ludo, your debugging speed is miraculous.  I did
not know SQLite uses multiple files per database.


> * With bad luck, it permanently fails, even a second, third, fourth,
>   fifth time fail.

It can still permanently fail to resume, e.g. sometimes when doing
Ctrl-c during download of a substitue, it will continue to say nss-certs
is an unknown package, but that may be too rare to happen by chance and
is not what this bug was about.

Closing!

Regards,
Florian




^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-18 16:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-02 17:52 bug#59784: [version 1.4.0rc1] Retrying a failed install fails pelzflorian (Florian Pelz)
2022-12-09  9:42 ` Ludovic Courtès
2022-12-09 11:11   ` Ludovic Courtès
2022-12-10  8:39   ` pelzflorian (Florian Pelz)
2022-12-13  9:40     ` Ludovic Courtès
2022-12-13  9:48       ` Ludovic Courtès
2022-12-13 22:22         ` pelzflorian (Florian Pelz)
2022-12-13 23:16           ` Ludovic Courtès
2022-12-14 13:36             ` pelzflorian (Florian Pelz)
2022-12-14 21:47               ` pelzflorian (Florian Pelz)
2022-12-14 23:50                 ` Ludovic Courtès
2022-12-15 17:46                   ` pelzflorian (Florian Pelz)
2022-12-15 20:44                     ` pelzflorian (Florian Pelz)
2022-12-16 16:57                       ` Ludovic Courtès
2022-12-16 20:28                         ` pelzflorian (Florian Pelz)
2022-12-17 11:01                           ` Ludovic Courtès
2022-12-17 19:36                             ` pelzflorian (Florian Pelz)
2022-12-17 16:15                         ` Ludovic Courtès
2022-12-17 19:27                           ` pelzflorian (Florian Pelz)
2022-12-17 21:30                             ` Ludovic Courtès
2022-12-18  0:23                               ` Ludovic Courtès
2022-12-16 13:55                 ` Maxime Devos
2022-12-16 20:17                   ` pelzflorian (Florian Pelz)
2022-12-18 16:41                 ` pelzflorian (Florian Pelz)
2022-12-12 12:07 ` pelzflorian (Florian Pelz)

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).