[-- Attachment #1: Type: text/plain, Size: 3361 bytes --] Hello, Using Guix from the master branch at commit b2122b07dc24007263b92247cc479713c2101390, with a system reconfigured on the 2nd of June (Guix commit bb325c5611553a6db21ee7499ac07d5757d24fc3): --8<---------------cut here---------------start------------->8--- Generation 216 Jun 02 2021 10:14:19 (current) file name: /var/guix/profiles/system-216-link canonical file name: /gnu/store/apjg70083nc5xj816y0ff3r8ir9gh5py-system label: GNU with Linux-Libre 5.11.20 bootloader: grub root device: /dev/mapper/cryptroot kernel: /gnu/store/ghijd80qabdyf0p6jcich9ggnpwrbwxw-linux-libre-5.11.20/bzImage channels: sfl-packages: repository URL: https://gitlab.com/Apteryks/sfl-guix-channel branch: master commit: 37d017573350b64f8a8c992530153f42806b6a6f guix: repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: bb325c5611553a6db21ee7499ac07d5757d24fc3 configuration file: /gnu/store/qvhl7ya2xn4gr9mn29hg93p1dcbdlyfy-configuration.scm --8<---------------cut here---------------end--------------->8--- with the guix-daemon running being: --8<---------------cut here---------------start------------->8--- /gnu/store/9zh3bg8d4y08jnkqyrk6xczahiahhcy4-guix-1.3.0-1.771b866/bin/guix-daemon 29920 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=no --substitute-urls http://127.0.0.1:8080 https://ci.guix.gnu.org --max-jobs=4 --8<---------------cut here---------------end--------------->8--- Attempting to update my profile keeps failing with: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix package -m ~/stow/guix/manifest.scm -L ~/src/sfl-guix-channel/ --substitute-urls=https://ci.guix.gnu.org --no-offload ;;; note: source file /home/maxim/src/guix-master/gnu/packages/networking.scm ;;; newer than compiled /home/maxim/src/guix-master/gnu/packages/networking.go ;;; note: source file /home/maxim/src/guix-master/gnu/packages/networking.scm ;;; newer than compiled /run/current-system/profile/lib/guile/3.0/site-ccache/gnu/packages/networking.go The following packages will be installed: acpi 1.7 adb 7.1.2_r36 adwaita-icon-theme 3.34.3 alsa-utils 1.2.4 [...] xrandr 1.5.1 xrdb 1.2.0 xsetroot 1.1.2 yelp 3.32.2 122.8 MB will be downloaded libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet. substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source --8<---------------cut here---------------end--------------->8--- I'm attaching my (large!) profile manifest. It depends on the https://gitlab.com/Apteryks/sfl-guix-channel channel, which just adds 3 Python packages. You could comment out the "sflvault-client" package from the manifest to lift that requirement. Thanks, Maxim [-- Attachment #2: manifest.scm --] [-- Type: text/plain, Size: 4318 bytes --] (use-modules (gnu packages) (gnu packages emacs) (guix build-system emacs) (guix profiles)) (concatenate-manifests (list ;;; Emacs packages. (specifications->manifest '("emacs" "emacs-auctex" "emacs-bash-completion" "emacs-bbdb" "emacs-cmake-mode" "emacs-company" "emacs-company-quickhelp" "emacs-counsel" "emacs-counsel-bbdb" "emacs-csv-mode" "emacs-debbugs" "emacs-diff-hl" "emacs-el-mock" "emacs-elpy" "emacs-emms" "emacs-ggtags" "emacs-go-mode" "emacs-grep-a-lot" "emacs-groovy-modes" "emacs-guix" "emacs-htmlize" "emacs-ivy" "emacs-magit" "emacs-markdown-mode" "emacs-nix-mode" "emacs-org" "emacs-org-reveal" "emacs-paredit" "emacs-php-mode" "emacs-pdf-tools" "emacs-qml-mode" "emacs-realgud" "emacs-rpm-spec-mode" "emacs-sr-speedbar" "emacs-string-inflection" "emacs-swiper" "emacs-w3m" "emacs-ws-butler" "emacs-yaml-mode" "emacs-yasnippet" "emacs-yasnippet-snippets")) ;; Other software. (specifications->manifest '("adb" "acpi" "adwaita-icon-theme" "alsa-utils" "anthy" "arc-icon-theme" "arc-theme" "aspell" "aspell-dict-en" "aspell-dict-fr" "autoconf" "automake" "autossh" "bash" "bc" "beep" "bind:utils" ;for 'dig' "bluez" "bridge-utils" "cheese" "compsize" "cqfd" "cryptsetup" "curl" "dbus" "dconf" "ddcutil" "diffoscope" "docker-cli" "dosfstools" "evince" "file" "font-adobe-source-han-sans" "font-dejavu" "font-google-roboto" "font-hack" "gcc-toolchain" "gdb" "geeqie" "ghostscript-with-x" "gimp" "git" "git:send-email" "glibc-locales" "global" "gnome-bluetooth" "gnome-boxes" "gnu-standards" "gnucash" "gnucash:doc" "gnupg" "graphviz" "grub" ;for the manual "gtk-engines" "guile" "guile-lib" "guile-readline" "guile-sqlite3" "guile-ssh" "hackneyed-x11-cursors" "hicolor-icon-theme" "hunspell" "hunspell-dict-fr" "ibus" "ibus-anthy" "icecat" "imagemagick" "inetutils" "inkscape" "iotop" "jack" "jami-gnome" "jami-qt" "nethogs" ;pre-process bandwith monitoring "jnettop" ;bandwidth monitoring "keepassxc" "libjpeg" "libmtp" "libpcap" "libreoffice" "libssh" "libx11" "linphone-desktop" "lm-sensors" "lsof" "ltrace" "lvm2" ;for dmsetup "maim" ;take screenshots "make" "man-pages" "mesa-utils" "moreutils" "mpv" "mtr" "ncftp" ;for gnupload "nmap" "openssh" "openvpn" "parted" "pavucontrol" "perl" "pinentry" "pkg-config" "poppler" "pulseaudio" "pv" "python" "python-wrapper" "qemu" "recutils" "rofi" "rsync" "rtorrent" "screen" "setxkbmap" "shepherd" "sicp" "smartmontools" "spacefm" "stow" "strace" "sysstat" ;for iostat "tcpdump" "the-silver-searcher" ;ag "time" ;aliased to time+ "transmission" "transmission:gui" "tree" "unzip" "vinagre" "vorbis-tools" "weechat" "wget" "workrave" "wpa-supplicant" "xclip" "xdpyinfo" "xdg-utils" "xev" "xmodmap" "xournal" "xrandr" "xrdb" "xsetroot" "yelp" "gxtuner" "shellcheck" "wireguard-tools" "wireshark")) ;; SFL stuff -- todo extract in separate manifest (specifications->manifest '("ansible" "docker-compose" "emacs-adoc-mode" "emacs-clang-format" "emacs-clang-rename" "emacs-feature-mode" "picocom" "python-git-review" "sflvault-client" "sshpass" "ungoogled-chromium" "ddrescue"))))
[-- Attachment #1: Type: text/plain, Size: 888 bytes --] > 122.8 MB will be downloaded > libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet. > substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed > guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source > --8<---------------cut here---------------end--------------->8--- > I often have the same problem when I do "guix package -u". (Same error message, same package libreoffice, same derivation) (Usually libreoffice, sometimes with other packages as well.) I don't know the cause though. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 260 bytes --]
Hi Maxim{,e}! Maxime Devos <maximedevos@telenet.be> skribis: >> 122.8 MB will be downloaded >> libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet. >> substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed >> guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source >> --8<---------------cut here---------------end--------------->8--- >> > > I often have the same problem when I do "guix package -u". > (Same error message, same package libreoffice, same derivation) > (Usually libreoffice, sometimes with other packages as well.) As a first step, can you reproduce the bug like this: while echo substitute /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf /tmp/t; done ? FWIW, I can’t seem to reproduce it with: --8<---------------cut here---------------start------------->8--- $ guix describe Generacio 185 Jun 07 2021 15:07:46 (nuna) guix e3611cc repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: e3611cc412e7b1c750a56d17fb1b7cde684baa3f --8<---------------cut here---------------end--------------->8--- TIA, Ludo’.
[-- Attachment #1: Type: text/plain, Size: 4456 bytes --] X-Debbugs-CC: ludo@gnu.org 48903@debbugs.gnu.org maxim.cournoyer@gmail.com tags: + substituter Ludovic Courtès schreef op vr 11-06-2021 om 17:09 [+0200]: > Hi Maxim{,e}! > > Maxime Devos <maximedevos@telenet.be> skribis: > > > > 122.8 MB will be downloaded > > > libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet. > > > substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed > > > guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source > > > --8<---------------cut here---------------end--------------->8--- > > > > > > > I often have the same problem when I do "guix package -u". > > (Same error message, same package libreoffice, same derivation) > > (Usually libreoffice, sometimes with other packages as well.) > > As a first step, can you reproduce the bug like this: > > while echo substitute /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf /tmp/t; done > > ? I cannot reproduce either, with $ guix describe Generatie 97 23:40:15 19 jun 2021 (huidig) guix 3aabe51 bewaarplaats-URL: https://git.savannah.gnu.org/git/guix.git tak: master commit: 3aabe51e8c09b9a2a87c03c40e3cc0f90d531bfd nonguix d81564f bewaarplaats-URL: https://gitlab.com/nonguix/nonguix tak: master commit: d81564f21e7d8800e6f6187fe2e1f6476e06bc30 so I wondered whether it is a networking issue, so I disabled & re-enabled wireless networking during the substitution and encountered a (to my knowledge) previously-unknown backtrace: $ while (echo substitute /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t && echo substitute /gnu/store/j0j7z6ckarjs9yi77sncszbmdgy38s70-guix-1.3.0-4.4985a42 /tmp/u) | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf /tmp/t && chmod -R +w /tmp/u && rm -rf /tmp/u ; done http://ci.guix.gnu.org/nar/lzip/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 aan het binnenhalen ... libreoffice-6.4.7.2 117.1MiB 1.7MiB/s 01:08 [##################] 100.0% http://ci.guix.gnu.org/nar/lzip/j0j7z6ckarjs9yi77sncszbmdgy38s70-guix-1.3.0-4.4985a42 aan het binnenhalen ... guix-1.3.0-4.4985a42 36.1MiB 223KiB/s 01:53 [############ ] 67.9%Backtrace: In guix/serialization.scm: 468:33 19 (read "/tmp/u/share/guile/site/3.0" _) 468:33 18 (read "/tmp/u/share/guile/site/3.0/gnu" _) 468:33 17 (read "/tmp/u/share/guile/site/3.0/gnu/packages" _) 442:24 16 (read "/tmp/u/share/guile/site/3.0/gnu/packages/crates…" …) 525:24 15 (_ "/tmp/u/share/guile/site/3.0/gnu/packages/crates-io…" …) In ice-9/ports.scm: 433:17 14 (call-with-output-file _ _ #:binary _ #:encoding _) In guix/serialization.scm: 247:20 13 (dump #<input: string 7f0d0b7d3d90> #<output: /tmp/u/s…> …) In unknown file: 12 (get-bytevector-n! #<input: string 7f0d0b7d3d90> # 0 #) In gcrypt/hash.scm: 223:13 11 (read! #vu8(32 32 40 97 114 103 117 109 101 110 116 # …) …) In unknown file: 10 (get-bytevector-n! #<input: string 7f0d0b7d3e00> # 0 #) In lzlib.scm: 501:4 9 (lzread! #<lz-decoder 7f0d0b7d5ef0> #<input: string 7f…> …) In unknown file: 8 (get-bytevector-n #<input: string 7f0d0b7d3e70> 65537) In guix/progress.scm: 368:31 7 (read! _ _ _) In unknown file: 6 (get-bytevector-n! #<input: string 7f0d0a15e4d0> # 0 #) In web/response.scm: 95:2 5 (read! _ _ _) In ice-9/boot-9.scm: 1685:16 4 (raise-exception _ #:continuable? _) 1685:16 3 (raise-exception _ #:continuable? _) 1780:13 2 (_ #<&compound-exception components: (#<&error> #<&irri…>) 1685:16 1 (raise-exception _ #:continuable? _) 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: Throw to key `bad-response' with args `("EOF while reading response body: ~a bytes of ~a" (26017808 37842870))'. This seems like a separate issue from #48903 though, so I opened a new bug report. Greetings, Maxime. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 260 bytes --]
Hi, Maxime Devos <maximedevos@telenet.be> skribis: > so I disabled & re-enabled wireless networking during the substitution > and encountered a (to my knowledge) previously-unknown backtrace: > > $ while (echo substitute /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t && echo substitute /gnu/store/j0j7z6ckarjs9yi77sncszbmdgy38s70-guix-1.3.0-4.4985a42 /tmp/u) | guix > substitute --substitute ; do chmod -R +w /tmp/t && rm -rf /tmp/t && chmod -R +w /tmp/u && rm -rf /tmp/u ; done [...] > In lzlib.scm: > 501:4 9 (lzread! #<lz-decoder 7f0d0b7d5ef0> #<input: string 7f…> …) > In unknown file: > 8 (get-bytevector-n #<input: string 7f0d0b7d3e70> 65537) > In guix/progress.scm: > 368:31 7 (read! _ _ _) > In unknown file: > 6 (get-bytevector-n! #<input: string 7f0d0a15e4d0> # 0 #) > In web/response.scm: > 95:2 5 (read! _ _ _) > In ice-9/boot-9.scm: > 1685:16 4 (raise-exception _ #:continuable? _) > 1685:16 3 (raise-exception _ #:continuable? _) > 1780:13 2 (_ #<&compound-exception components: (#<&error> #<&irri…>) > 1685:16 1 (raise-exception _ #:continuable? _) > 1685:16 0 (raise-exception _ #:continuable? _) > > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > Throw to key `bad-response' with args `("EOF while reading response body: ~a bytes of ~a" (26017808 37842870))'. Ah indeed, this is poorly handled. I’m not really sure how to address it. I/O ports are a nice abstraction as it allows you to transparently read “streams” from any medium, but as always, that also comes with opacity where the call site is not supposed to know what kind of exceptions might be thrown deep down. Thoughts? Ludo’.
Hello Ludovic, Ludovic Courtès <ludo@gnu.org> writes: [...] > As a first step, can you reproduce the bug like this: > > while echo substitute > /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t > | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf > /tmp/t; done > > ? > > FWIW, I can’t seem to reproduce it with: > > $ guix describe > Generacio 185 Jun 07 2021 15:07:46 (nuna) > guix e3611cc > repository URL: https://git.savannah.gnu.org/git/guix.git > branch: master > commit: e3611cc412e7b1c750a56d17fb1b7cde684baa3f I can't seem to reproduce either. Perhaps the issue only arises when there are many things happening concurrently. My daemon runs with: --8<---------------cut here---------------start------------->8--- $ sudo ps -eF | grep guix-daemon root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00 /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=no --substitute-urls http://127.0.0.1:8080 https://ci.guix.gnu.org --max-jobs=4--8<---------------cut here---------------end--------------->8--- I can rather easily (and annoyingly!) trigger the problem (and a few variations of it, it seems) with something like: $ packages=$(guix refresh -l protobuf | sed 's/^.*: //') $ guix build -v3 --keep-going $packages For example, running the above, I just got: --8<---------------cut here---------------start------------->8--- guix build: error: corrupt input while restoring archive from #<closed: file 7fc95acfc2a0> --8<---------------cut here---------------end--------------->8--- Does the above commands succeed on the first time on your end? If you have already lots of things cached, you can try for an architecture you don't often build for by adding the '--system=i686-linux' option; that should cause a massive amount of downloads, likely to trigger the problem. Perhaps also try to use --max-jobs=4. If you have ideas of how to debug this when I hit the issue I'm all ears :-). Thank you! Maxim
[-- Attachment #1: Type: text/plain, Size: 2055 bytes --] Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > $ sudo ps -eF | grep guix-daemon > root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00 /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=no --substitute-urls http://127.0.0.1:8080 https://ci.guix.gnu.org --max-jobs=4--8<---------------cut here---------------end--------------->8--- > > I can rather easily (and annoyingly!) trigger the problem (and a few > variations of it, it seems) with something like: > > $ packages=$(guix refresh -l protobuf | sed 's/^.*: //') > $ guix build -v3 --keep-going $packages > > For example, running the above, I just got: > > guix build: error: corrupt input while restoring archive from #<closed: > file 7fc95acfc2a0> > --8<---------------cut here---------------end--------------->8--- > > Does the above commands succeed on the first time on your end? If you > have already lots of things cached, you can try for an architecture you > don't often build for by adding the '--system=i686-linux' option; that > should cause a massive amount of downloads, likely to trigger the > problem. Perhaps also try to use --max-jobs=4. I’ve tried that, with --max-jobs=4, and it fills my disk just fine. :-/ > If you have ideas of how to debug this when I hit the issue I'm all ears > :-). The attached patch substitutes a number of store items in a row; run: guix repl -- substitute-stress.scm and it’ll fill /tmp/substitute-test with 200 substitutes, which should be equivalent to the kind of stress test you had above. It doesn’t crash for me. There are a few “error: no valid substitute for /gnu/store/…” errors, but these are expected: was ask for substitutes for 200 packages without first checking whether substitutes are available. Could you run it and report back? You can try with more packages, different substitute URLs, etc. TIA! Ludo’. [-- Attachment #2: the stress test --] [-- Type: text/plain, Size: 2164 bytes --] (use-modules (guix) (gnu packages) (guix scripts substitute) (guix grafts) (guix build utils) (srfi srfi-1) (ice-9 match) (ice-9 threads)) (define test-directory "/tmp/substitute-test") (define packages ;; Subset of packages for which we request substitutes. (take (fold-packages cons '()) 200)) (define (spawn-substitution-thread input urls) "Spawn a 'guix substitute' thread that reads commands from INPUT and uses URLS as the substitute servers." (call-with-new-thread (lambda () (parameterize ((%reply-file-descriptor #f) (current-input-port input)) (setenv "_NIX_OPTIONS" (string-append "substitute-urls=" (string-join urls))) (let loop () (format (current-error-port) "starting substituter~%") ;; Catch "no valid substitute" errors. (catch 'quit (lambda () (guix-substitute "--substitute")) (const #f)) (unless (eof-object? (peek-char input)) (loop))))))) (match (pipe) ((input . output) (let ((thread (spawn-substitution-thread input %default-substitute-urls))) ;; Remove the test directory. (when (file-exists? test-directory) (for-each make-file-writable (find-files test-directory #:directories? #t)) (delete-file-recursively test-directory)) (mkdir-p test-directory) (parameterize ((%graft? #false)) (with-store store ;; Ask for substitutes for PACKAGES. (for-each (lambda (package n) (define item (run-with-store store (package-file package))) (format output "substitute ~a ~a/~a~%" item test-directory n)) packages (iota (length packages)))) (format #t "sent ~a substitution requests...~%" (length packages)) (close-port output) ;; Wait for substitution to complete. (join-thread thread)))))
Hello!
Ludovic Courtès <ludo@gnu.org> writes:
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> $ sudo ps -eF | grep guix-daemon
>> root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00
>> /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon
>> 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression
>> none --discover=no --substitute-urls http://127.0.0.1:8080
>> https://ci.guix.gnu.org --max-jobs=4--8<---------------cut
>> here---------------end--------------->8---
>>
>> I can rather easily (and annoyingly!) trigger the problem (and a few
>> variations of it, it seems) with something like:
>>
>> $ packages=$(guix refresh -l protobuf | sed 's/^.*: //')
>> $ guix build -v3 --keep-going $packages
>>
>> For example, running the above, I just got:
>>
>> guix build: error: corrupt input while restoring archive from #<closed:
>> file 7fc95acfc2a0>
>> --8<---------------cut here---------------end--------------->8---
>>
>> Does the above commands succeed on the first time on your end? If you
>> have already lots of things cached, you can try for an architecture you
>> don't often build for by adding the '--system=i686-linux' option; that
>> should cause a massive amount of downloads, likely to trigger the
>> problem. Perhaps also try to use --max-jobs=4.
>
> I’ve tried that, with --max-jobs=4, and it fills my disk just fine. :-/
>
>> If you have ideas of how to debug this when I hit the issue I'm all ears
>> :-).
>
> The attached patch substitutes a number of store items in a row; run:
>
> guix repl -- substitute-stress.scm
>
> and it’ll fill /tmp/substitute-test with 200 substitutes, which should
> be equivalent to the kind of stress test you had above.
>
> It doesn’t crash for me. There are a few “error: no valid substitute
> for /gnu/store/…” errors, but these are expected: was ask for
> substitutes for 200 packages without first checking whether substitutes
> are available.
>
> Could you run it and report back?
>
> You can try with more packages, different substitute URLs, etc.
>
> TIA!
>
> Ludo’.
[...]
I've tried with the following modified version which runs multiple
threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
to trigger it, although the hard drive is grinding heavily:
--8<---------------cut here---------------start------------->8---
(use-modules (guix) (gnu packages)
(guix scripts substitute)
(guix grafts)
(guix build utils)
(srfi srfi-1)
(ice-9 match)
(ice-9 threads))
(define test-directory "/tmp/substitute-test")
(define max-jobs 4)
(define packages
;; Subset of packages for which we request substitutes.
(append (map specification->package '("libreoffice"
"ungoogled-chromium"
"openjdk"
"texmacs"))
(take (fold-packages cons '()) 1000)))
(define (spawn-substitution-thread input urls)
"Spawn a 'guix substitute' thread that reads commands from INPUT and uses
URLS as the substitute servers."
(call-with-new-thread
(lambda ()
(parameterize ((%reply-file-descriptor #f)
(current-input-port input))
(setenv "_NIX_OPTIONS"
(string-append "substitute-urls=" (string-join urls)))
(let loop ()
(format (current-error-port) "starting substituter~%")
;; Catch "no valid substitute" errors.
(catch 'quit
(lambda ()
(guix-substitute "--substitute"))
(const #f))
(unless (eof-object? (peek-char input))
(loop)))))))
(for-each (lambda (job)
(match (pipe)
((input . output)
(let ((test-directory* (string-append test-directory "-"
(number->string job)))
(thread (spawn-substitution-thread
input %default-substitute-urls)))
;; Remove the test directory.
(when (file-exists? test-directory*)
(for-each (lambda (f)
(false-if-exception (make-file-writable f)))
(find-files test-directory #:directories? #t))
(delete-file-recursively test-directory*))
(mkdir-p test-directory*)
(parameterize ((%graft? #false))
(with-store store
;; Ask for substitutes for PACKAGES.
(for-each (lambda (package n)
(define item
(run-with-store store
(package-file package)))
(format output "substitute ~a ~a/~a~%"
item test-directory* n))
packages
(iota (length packages))))
(format #t "sent ~a substitution requests...~%"
(length packages))
(close-port output)
;; Wait for substitution to complete.
(join-thread thread))))))
(iota max-jobs))
--8<---------------cut here---------------end--------------->8---
I wonder if there's something more happening in the real scenario
(validating signatures when putting things in the store? or something
similar) that may have a role in the failure.
That's a tough nut to crack!
I'll keep looking for clues.
Thanks for your time!
Maxim
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> I've tried with the following modified version which runs multiple
> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
> to trigger it, although the hard drive is grinding heavily:
Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
processes, which is not what the script is doing here.
Are the other conditions the same, for instance same network, etc.?
Thanks,
Ludo’.
Hello, Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > >> I've tried with the following modified version which runs multiple >> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet >> to trigger it, although the hard drive is grinding heavily: > > Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’ > processes, which is not what the script is doing here. Oh! I had overlooked that. What the modified version did is create threads rather than processes, right? > Are the other conditions the same, for instance same network, etc.? Yes! Maxim
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>
>>> I've tried with the following modified version which runs multiple
>>> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
>>> to trigger it, although the hard drive is grinding heavily:
>>
>> Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
>> processes, which is not what the script is doing here.
>
> Oh! I had overlooked that. What the modified version did is create
> threads rather than processes, right?
Yes.
So I’m not sure how to better test this. Perhaps you could try
introducing random delays in the loop (which could cause connections to
go stale), using different substitute URLs, things like that.
Thanks,
Ludo’.
[-- Attachment #1: Type: text/plain, Size: 1298 bytes --] Hi! Ludovic Courtès <ludo@gnu.org> writes: > Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > >> Ludovic Courtès <ludo@gnu.org> writes: >> >>> Hi, >>> >>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: >>> >>>> I've tried with the following modified version which runs multiple >>>> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet >>>> to trigger it, although the hard drive is grinding heavily: >>> >>> Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’ >>> processes, which is not what the script is doing here. >> >> Oh! I had overlooked that. What the modified version did is create >> threads rather than processes, right? > > Yes. > > So I’m not sure how to better test this. Perhaps you could try > introducing random delays in the loop (which could cause connections to > go stale), using different substitute URLs, things like that. I've tried some to reproduce the issue with the modified scripts below, but in vain. I'm not sure if my delay is inserted at the right place. I also suspect that my attempt to shuffle the substitute-urls is not really useful, as that's probably what would have happened anyway (although I haven't followed in the code deeply enough to confirm). [-- Attachment #2: substitute-stress-launcher.sh --] [-- Type: application/x-sh, Size: 137 bytes --] [-- Attachment #3: substitute-stress.scm --] [-- Type: application/octet-stream, Size: 3237 bytes --] (use-modules (guix) (gnu packages) (guix scripts substitute) (guix grafts) (guix build utils) (srfi srfi-1) (srfi srfi-27) (ice-9 match) (ice-9 threads)) (setvbuf (current-input-port) 'line) (define test-directory "/tmp/substitute-test") (define job-id (getenv "JOB_ID")) (define packages ;; Subset of packages for which we request substitutes. (append (map specification->package '("libreoffice" "ungoogled-chromium" "openjdk" "texmacs")) (take (fold-packages cons '()) 200))) (define (spawn-substitution-thread input urls) "Spawn a 'guix substitute' thread that reads commands from INPUT and uses URLS as the substitute servers." (call-with-new-thread (lambda () (parameterize ((%reply-file-descriptor #f) (current-input-port input)) (setenv "_NIX_OPTIONS" (string-append "substitute-urls=" (string-join urls))) (let loop () (format (current-error-port) "starting substituter~%") ;; Catch "no valid substitute" errors. (catch 'quit (lambda () (guix-substitute "--substitute")) (const #f)) (unless (eof-object? (peek-char input)) (loop))))))) (match (pipe) ((input . output) (let ((test-directory* (string-append test-directory "-" job-id)) (thread (spawn-substitution-thread input %default-substitute-urls))) (format (current-output-port) "starting job ~a...~%" job-id) ;; Remove the test directory. (when (file-exists? test-directory*) (for-each (lambda (f) (false-if-exception (make-file-writable f))) (find-files test-directory* #:directories? #t)) (delete-file-recursively test-directory*)) (mkdir-p test-directory*) (parameterize ((%graft? #false)) (with-store store ;; Ask for substitutes for PACKAGES. (for-each (lambda (package n) ;; Random sleep. (sleep (random-integer 6)) (define item (run-with-store store (let ((substitute-url (list-ref %default-substitute-urls (random-integer (length %default-substitute-urls))))) (pk 'using-substitute-url substitute-url) (set-build-options store #:substitute-urls (list substitute-url)) (package-file package)))) (format output "substitute ~a ~a/~a~%" item test-directory* n)) packages (iota (length packages)))) (format #t "sent ~a substitution requests...~%" (length packages)) (close-port output) ;; Wait for substitution to complete. (join-thread thread))))) [-- Attachment #4: Type: text/plain, Size: 16 bytes --] Thanks, Maxim
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> I've tried some to reproduce the issue with the modified scripts below,
> but in vain. I'm not sure if my delay is inserted at the right place.
> I also suspect that my attempt to shuffle the substitute-urls is not
> really useful, as that's probably what would have happened anyway
> (although I haven't followed in the code deeply enough to confirm).
Bah. :-/ Do the two of you still experience the bug initially reported
here in “real” conditions?
Are we sure we’re using the same Guix + Guile when running the stress
test and in real conditions?
Thanks for testing!
Ludo’.
Hello,
Ludovic Courtès <ludo@gnu.org> writes:
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I've tried some to reproduce the issue with the modified scripts below,
>> but in vain. I'm not sure if my delay is inserted at the right place.
>> I also suspect that my attempt to shuffle the substitute-urls is not
>> really useful, as that's probably what would have happened anyway
>> (although I haven't followed in the code deeply enough to confirm).
>
> Bah. :-/ Do the two of you still experience the bug initially reported
> here in “real” conditions?
>
> Are we sure we’re using the same Guix + Guile when running the stress
> test and in real conditions?
>
> Thanks for testing!
>
> Ludo’.
I've been doing builds on core-updates, which would previously trigger
this problem rather often, *without* suffering from the problem.
I consider it resolved. I'm not exactly sure how, which is not
satisfying, but I'm glad it's gone.
Thank you!
Closing.
Maxim
[-- Attachment #1.1.1: Type: text/plain, Size: 2182 bytes --] > Ah indeed, this is poorly handled. > > I’m not really sure how to address it. I/O ports are a nice abstraction > as it allows you to transparently read “streams” from any medium, but as > always, that also comes with opacity where the call site is not supposed > to know what kind of exceptions might be thrown deep down. > > Thoughts? About 'as always, [...]’: [citation needed]. AFAICT, nowhere does Guile documentation state they they aren't supposed to know. Also, that seems a bad supposition to me if it prevents fixing the bug. I would just ignore that 'not supposed to’. I think this supposition needs some adjustment in order to be practical: guix/scripts/substitute.scm has opened the network connection (via http-client), so guix/scripts/substitute.scm is responsible for the connection (unless it delegates of course), so guix/scripts/substitute.scm is supposed to know what to do about exceptions involving that connection (unless it delegates). That there are some intermediate modules before things are actually read from the port is irrelevant -- substitute.scm opened the port, is responsible for the port and knows best how to handle exceptional situations involving that port. Nothing lower-level has the right context/information to make a good decision on how to handle the exception, so no delegation possible. As such, guix/scripts/substitute.scm should do it. It's not 100% clear from the backtrace, but it appears that the exception (from guix/scripts/substitute.scm perspective) happens at: ;; Procedure: process-substitution ;; Unpack the Nar at INPUT into DESTINATION. (define cpu-usage (with-cpu-usage-monitoring (restore-file hashed destination #:dump-file (if (and destination-in-store? deduplicate?) dump-file/deduplicate* dump-file)))) This should then be wrapped in an error handler catching 'bad-response', maybe reporting it, and switching over the next URL. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --]