unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#17591: Offload hook fails to send files
@ 2014-05-25 20:59 Ludovic Courtès
  2014-05-25 21:28 ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2014-05-25 20:59 UTC (permalink / raw)
  To: 17591

We’ve been observing things like this (as of v0.6-257-g2508358):

--8<---------------cut here---------------start------------->8---
load on machine 'hydra.gnunet.org' is 0.52 (normalized: 0.17333333333333334)
@ build-started /gnu/store/wcgr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv - x86_64-linux /var/log/guix/drvs/wc//gr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
Backtrace:
In ice-9/boot-9.scm:
 157: 15 [catch #t #<catch-closure 1e3e2e0> ...]
In unknown file:
   ?: 14 [apply-smob/1 #<catch-closure 1e3e2e0>]
In ice-9/boot-9.scm:
  63: 13 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 12 [eval # #]
In ice-9/boot-9.scm:
2320: 11 [save-module-excursion #<procedure 1e70c80 at ice-9/boot-9.scm:3961:3 ()>]
3966: 10 [#<procedure 1e70c80 at ice-9/boot-9.scm:3961:3 ()>]
1645: 9 [%start-stack load-stack ...]
1650: 8 [#<procedure 1e6f000 ()>]
In unknown file:
   ?: 7 [primitive-load "/usr/local/bin/guix"]
In guix/ui.scm:
 630: 6 [run-guix-command archive "--import"]
In ice-9/boot-9.scm:
 157: 5 [catch srfi-34 #<procedure 2c9f920 at guix/ui.scm:198:2 ()> ...]
 157: 4 [catch system-error ...]
In guix/scripts/archive.scm:
 327: 3 [#<procedure 2ca0cf0 at guix/scripts/archive.scm:312:2 ()>]
In guix/store.scm:
 768: 2 [import-paths #<build-daemon 256.14 2c9b780> #<input: file 0>]
 404: 1 [process-stderr #<build-daemon 256.14 2c9b780> #<input: file 0>]
In unknown file:
   ?: 0 [bytevector-length #<eof>]

ERROR: In procedure bytevector-length:
ERROR: In procedure scm_c_bytevector_length: Wrong type argument in position 1 (expecting bytevector): #<eof>
builder for `/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv' failed to produce output path `/gnu/store/k1hdhfx3b2lkcwkch70fcj998jg7qs7y-binutils-2.24'
--8<---------------cut here---------------end--------------->8---

If we try later the very same thing, it typically succeeds:

--8<---------------cut here---------------start------------->8---
@ build-started /gnu/store/wcgr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv - x86_64-linux /var/log/guix/drvs/wc//gr1ddm3d6c166pmf5r17yfw2566w0d-libstdc++-4.8.3.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
importing path `/gnu/store/5wxd03bpxbw6g4ik9nvq5242p2h1f9zw-binutils-2.24-guile-builder'
importing path `/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv'
offloading '/gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv' to 'hydra.gnunet.org'...
@ build-remote /gnu/store/qbvyn4yq7m92mpp7wvwpx8gnk32863vw-binutils-2.24.drv hydra.gnunet.org
--8<---------------cut here---------------end--------------->8---

In the first case, ‘send-files’ appears to be failing early, before the
first store item has been transmitted, because the corresponding
“importing path” message hasn’t been printed yet (it’s printed once the
contents of the item being imported have been read–see local-store.cc.)

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17591: Offload hook fails to send files
  2014-05-25 20:59 bug#17591: Offload hook fails to send files Ludovic Courtès
@ 2014-05-25 21:28 ` Ludovic Courtès
  2014-05-27 15:16   ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2014-05-25 21:28 UTC (permalink / raw)
  To: 17591

This also manifests like this:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build hello --no-substitutes
The following derivations will be built:
   /gnu/store/xi7lgm7vyhzhl1z5dy1mba1yki27xkbm-hello-2.9.drv
   /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv
hello from "/home/ludo/soft/etc/guix/machines.scm" (PID 1918)
process 1918 acquired build slot '/var/guix/offload/hydra.gnunet.org/0'
load on machine 'hydra.gnunet.org' is 0.2 (normalized: 0.06666666666666667)
@ build-started /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv - x86_64-linux /var/log/guix/drvs/xa//n82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv.bz2
sending 2 store files to 'hydra.gnunet.org'...
offloading '/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv' to 'hydra.gnunet.org'...
@ build-remote /gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv hydra.gnunet.org
guix build: error: open-file: No such file or directory: "/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv"
derivation '/gnu/store/xan82azzisn7b69c0y9856wz2diw3ry1-grep-2.18.drv' offloaded to 'hydra.gnunet.org' failed with exit code 1
--8<---------------cut here---------------end--------------->8---

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17591: Offload hook fails to send files
  2014-05-25 21:28 ` Ludovic Courtès
@ 2014-05-27 15:16   ` Ludovic Courtès
  2014-05-27 20:56     ` Ludovic Courtès
  2014-05-28 15:13     ` Ludovic Courtès
  0 siblings, 2 replies; 5+ messages in thread
From: Ludovic Courtès @ 2014-05-27 15:16 UTC (permalink / raw)
  To: 17591

I modified the offload hook and (guix build utils) so ‘cat’ is used
instead of ‘xz’.

When tracing sshd on the target machine and looking at the ‘guix archive
--import’ process in a failure case we see:

--8<---------------cut here---------------start------------->8---
21927 connect(10, {sa_family=AF_FILE, path="/usr/local/var/guix/daemon-socket/socket"}, 42) = 0
21927 write(10, "cxin\0\0\0\0", 8)      = 8
21927 read(10, "oixd\0\0\0\0", 8)       = 8
21927 read(10, "\16\1\0\0\0\0\0\0", 8)  = 8
21927 write(10, "\f\1\0\0\0\0\0\0", 8)  = 8
21927 write(10, "\1\0\0\0\0\0\0\0", 8)  = 8
21927 read(10, "stla\0\0\0\0", 8)       = 8
21927 write(10, "\33\0\0\0\0\0\0\0", 8) = 8
21927 read(10, "atad\0\0\0\0", 8)       = 8
21927 read(10, "\0\200\0\0\0\0\0\0", 8) = 8
21927 read(0, "\1\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0(\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0regular\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0(begin (use-modules (guix build gnu-build-system) (guix build utils)) (define %output (getenv \"out\")) (define %outputs (map (lambd"..., 4096) = 4096
21927 read(0, "C1B08ED62008395B0764CD9B55E80DA0A2B61C698DC27EA98E68BB576ACFC2B91B4D7283E7D960948D049D6E3C4CB1F489B460A120A4BB6C04A843FD3A67454136DE61CF68A927871EFFA9141BD372A748593C703E0301F039A9E674C50301BFC385BABE5B154250E7D57B82DB31F1E1AC696F870D"..., 4096) = 4096
21927 read(0,  <unfinished ...>

[...]

21927 <... read resumed> "ile-builder\0\0\0\0\0;\0\0\0\0\0\0\0/gnu/store/ip4w122y4wlkw9n1nb6g2sbz4ql0m8dy-patch-2.7.1.drv\0\0\0\0\0009\0\0\0\0\0\0\0/gnu/store/j6gq2p03mxjlmhmamg2mxixdf13djd60-grep-2.18.drv\0\0\0\0\0\0\0008\0\0\0\0\0\0\0/gnu/store/k3azrhdbjrdhjc3vlx154069imcql8jn-bash-4.3.drv;\0\0\0\0\0\0\0/g"..., 4096) = 2880
21926 <... write resumed> )             = 2880
21927 read(0,  <unfinished ...>
21926 read(0, "", 65536)                = 0
21926 close(0)                          = 0
21926 close(1 <unfinished ...>
21927 <... read resumed> "", 4096)      = 0
21926 <... close resumed> )             = 0
21926 close(2)                          = 0
21927 write(10, "@+\0\0\0\0\0\0", 8 <unfinished ...>
21926 exit_group(0)                     = ?
21927 <... write resumed> )             = 8
21927 write(10, "\1\0\0\0\0\0\0\0\0000\213\1\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0 0\213\1\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0@0\213\1\0\0\0\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0`0\213\1\0\0\0\0use-modules (guix build \2000\213\1\0\0\0\0d-system) (guix build ut\2400\213\1\0\0\0\0efine %output (getenv \"o\3000\213\1\0\0\0\0efine %outputs (map (lam\3400"..., 11072 <unfinished ...>
21925 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21926
--8<---------------cut here---------------end--------------->8---

It reads 11072 bytes (as expected) from stdin, but by the time it writes
them on the socket (the last write(10, ...) call), those 11072 bytes got
corrupted.

The above snippet corresponds to this part of store.scm:

--8<---------------cut here---------------start------------->8---
          ((= k %stderr-read)
           ;; Read a byte stream from USER-PORT.
           (let* ((max-len (read-int p))
                  (data    (get-bytevector-n user-port max-len))
                  (len     (bytevector-length data)))
             (write-int len p)
             (put-bytevector p data)
             (write-padding len p)
             #f))
--8<---------------cut here---------------end--------------->8---

where ‘data’ is the corrupted bytevector.

The good news is that I can reproduce it like this:

--8<---------------cut here---------------start------------->8---
$ guix archive --export $(guix build -d coreutils) > t.nar
$ while guix archive --import <t.nar ; do : ; done
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'
[... wait ...]
Backtrace:
In ice-9/boot-9.scm:
 157: 15 [catch #t #<catch-closure 2062ce0> ...]
In unknown file:
   ?: 14 [apply-smob/1 #<catch-closure 2062ce0>]
In ice-9/boot-9.scm:
  63: 13 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 12 [eval # #]
In ice-9/boot-9.scm:
2401: 11 [save-module-excursion #<procedure 207f940 at ice-9/boot-9.scm:4045:3 ()>]
4050: 10 [#<procedure 207f940 at ice-9/boot-9.scm:4045:3 ()>]
1724: 9 [%start-stack load-stack ...]
1729: 8 [#<procedure 2096ea0 ()>]
In unknown file:
   ?: 7 [primitive-load "/home/ludo/soft/bin/guix"]
In guix/ui.scm:
 631: 6 [run-guix-command archive "--import"]
In ice-9/boot-9.scm:
 157: 5 [catch srfi-34 #<procedure 3018b00 at guix/ui.scm:198:2 ()> ...]
 157: 4 [catch system-error ...]
In guix/scripts/archive.scm:
 327: 3 [#<procedure 3019a80 at guix/scripts/archive.scm:312:2 ()>]
In guix/store.scm:
 766: 2 [import-paths #<build-daemon 256.14 2ff7000> #<input: file 0>]
 400: 1 [process-stderr #<build-daemon 256.14 2ff7000> #<input: file 0>]
In unknown file:
   ?: 0 [bytevector-length #<eof>]

ERROR: In procedure bytevector-length:
ERROR: In procedure scm_c_bytevector_length: Wrong type argument in position 1 (expecting bytevector): #<eof>
--8<---------------cut here---------------end--------------->8---

To be continued...

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17591: Offload hook fails to send files
  2014-05-27 15:16   ` Ludovic Courtès
@ 2014-05-27 20:56     ` Ludovic Courtès
  2014-05-28 15:13     ` Ludovic Courtès
  1 sibling, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2014-05-27 20:56 UTC (permalink / raw)
  To: 17591

ludo@gnu.org (Ludovic Courtès) skribis:

> 21927 write(10, "\1\0\0\0\0\0\0\0\0000\213\1\0\0\0\0nix-archive-1\0\0\0\1\0\0\0\0\0\0\0 0\213\1\0\0\0\0\4\0\0\0\0\0\0\0type\0\0\0\0\7\0\0\0\0\0\0\0@0\213\1\0\0\0\0\10\0\0\0\0\0\0\0contents\255\10\0\0\0\0\0\0`0\213\1\0\0\0\0use-modules (guix build \2000\213\1\0\0\0\0d-system) (guix build ut\2400\213\1\0\0\0\0efine %output (getenv \"o\3000\213\1\0\0\0\0efine %outputs (map (lam\3400"..., 11072 <unfinished ...>
> 21925 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 21926
>
> It reads 11072 bytes (as expected) from stdin, but by the time it writes
> them on the socket (the last write(10, ...) call), those 11072 bytes got
> corrupted.
>
> The above snippet corresponds to this part of store.scm:
>
>           ((= k %stderr-read)
>            ;; Read a byte stream from USER-PORT.
>            (let* ((max-len (read-int p))
>                   (data    (get-bytevector-n user-port max-len))
>                   (len     (bytevector-length data)))
>              (write-int len p)
>              (put-bytevector p data)
>              (write-padding len p)
>              #f))
>
> where ‘data’ is the corrupted bytevector.
>
> The good news is that I can reproduce it like this:
>
> $ guix archive --export $(guix build -d coreutils) > t.nar
> $ while guix archive --import <t.nar ; do : ; done

The problem comes from the ‘get-bytevector-n’ call above, and
specifically the ‘scm_c_shrink_bytevector’ and ‘GC_REALLOC’ calls it
entails.  If we change that to, say, ‘scm_make_bytevector’ (in
libguile), then the problem goes away.

So that looks like a GC_REALLOC bug (in libgc 7.4.0 and 7.2d at least),
but unfortunately I’ve been unable to reduce it further (neither in
Scheme nor in C.)

On the Guix side, commit 5895f24 works around the problem.  Now we need
a workaround in libguile, and ideally a proper bug report for libgc.

(I’ll close this bug when we confirm that it solves the problem on
Hydra.)

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#17591: Offload hook fails to send files
  2014-05-27 15:16   ` Ludovic Courtès
  2014-05-27 20:56     ` Ludovic Courtès
@ 2014-05-28 15:13     ` Ludovic Courtès
  1 sibling, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2014-05-28 15:13 UTC (permalink / raw)
  To: 17591

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

ludo@gnu.org (Ludovic Courtès) skribis:

> $ guix archive --export $(guix build -d coreutils) > t.nar
> $ while guix archive --import <t.nar ; do : ; done

Slightly reduced test case: after creating t.nar as above, run the code
below in a loop until it stops (with a pre-5895f24 Guix!):

--8<---------------cut here---------------start------------->8---
(use-modules (guix)
             (gnu packages base)
             (rnrs io ports))

(with-store store
  (let* (;; (drv   (package-derivation store coreutils))
         ;; (dump  (call-with-bytevector-output-port
         ;;         (lambda (port)
         ;;           (export-paths store (list (derivation-file-name drv))
         ;;                         port))))
         (dump  (call-with-input-file "t.nar" get-bytevector-all)))
    (unless (pair?
             (pk 'import (import-paths store (open-bytevector-input-port dump))))
      (format #t
              "attach GDB to ~a, call GC_default_print_heap_obj_proc,etc.~%"
              (getpid))
      (sleep 1000)
      (exit 1))))
--8<---------------cut here---------------end--------------->8---

Also useful is to modify libguile like this:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 664 bytes --]

diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index be8b654..9a8902f 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -315,10 +315,14 @@ scm_c_shrink_bytevector (SCM bv, size_t c_new_len)
   SCM_BYTEVECTOR_SET_LENGTH (bv, c_new_len);
 
   if (SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
+    {
+      fprintf (stderr, "%s %zi -> %zi\n", __func__, c_len, c_new_len);
       new_bv = PTR2SCM (scm_gc_realloc (SCM2PTR (bv),
 					c_len + SCM_BYTEVECTOR_HEADER_BYTES,
 					c_new_len + SCM_BYTEVECTOR_HEADER_BYTES,
 					SCM_GC_BYTEVECTOR));
+      printf ("new_bv = %p, bv = %p\n", new_bv, bv);
+    }
   else
     {
       signed char *c_bv;

[-- Attachment #3: Type: text/plain, Size: 907 bytes --]


Here I see uninteresting things like:

--8<---------------cut here---------------start------------->8---
scm_c_shrink_bytevector 32768 -> 6960
new_bv = 0x1e96000, bv = 0x1e9a000
importing path `/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv'

;;; (import ("/gnu/store/5nhsz368f88bbgkjjwzz5k24nnnrk544-coreutils-8.22.drv"))
scm_c_shrink_bytevector 32768 -> 6960
new_bv = 0x1efb000, bv = 0x1eff000

;;; (import ())
attach GDB to 10124
--8<---------------cut here---------------end--------------->8---

Then I can call GC_default_print_heap_obj_proc(0x1efb000) from GDB,
which displays:

--8<---------------cut here---------------start------------->8---
object at 0x1efb000 of appr. 6992 bytes (atomic)
--8<---------------cut here---------------end--------------->8---

IOW, everything looks alright, except that the contents of new_bv really
are corrupt.

Ludo’.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-28 15:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-25 20:59 bug#17591: Offload hook fails to send files Ludovic Courtès
2014-05-25 21:28 ` Ludovic Courtès
2014-05-27 15:16   ` Ludovic Courtès
2014-05-27 20:56     ` Ludovic Courtès
2014-05-28 15:13     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).