unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#39925: `guix pull` failure in multi-machine setup
@ 2020-03-05 13:33 Lars-Dominik Braun
  2020-03-05 17:20 ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Lars-Dominik Braun @ 2020-03-05 13:33 UTC (permalink / raw)
  To: 39925

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

Hi,

I’m using guix on a multi-machine setup with a single remote guix-daemon that
can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master.<domain> on the
compute nodes. Running `guix pull` on master works fine (the variable is not
set here), but it does not on a compute node. Instead it fails with this error:

---snip---
Backtrace:
           1 (primitive-load "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation")
In ice-9/eval.scm:
   293:34  0 (_ #(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user) 7f19dd213140> (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))

ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "/var/guix/daemon-socket/socket" errno: 111] 7f19dba3a090>)'.
guix pull: error: You found a bug: the program '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation'
failed to compute the derivation for Guix (version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux";
host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1).
Please report it by email to <bug-guix@gnu.org>.
---snap---

Obviously the socket on that compute machine is not working, because it’s on an
NFS share /var/guix belonging to master. But why is the socket considered in
the first place?

Cheers,
Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-05 13:33 bug#39925: `guix pull` failure in multi-machine setup Lars-Dominik Braun
@ 2020-03-05 17:20 ` Ludovic Courtès
  2020-03-06  7:40   ` Lars-Dominik Braun
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2020-03-05 17:20 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 39925

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

> I’m using guix on a multi-machine setup with a single remote guix-daemon that
> can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master.<domain> on the
> compute nodes. Running `guix pull` on master works fine (the variable is not
> set here), but it does not on a compute node. Instead it fails with this error:
>
> ---snip---
> Backtrace:
>            1 (primitive-load "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation")
> In ice-9/eval.scm:
>    293:34  0 (_ #(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user) 7f19dd213140> (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))
>
> ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "/var/guix/daemon-socket/socket" errno: 111] 7f19dba3a090>)'.
> guix pull: error: You found a bug: the program '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation'
> failed to compute the derivation for Guix (version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux";
> host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1).
> Please report it by email to <bug-guix@gnu.org>.
> ---snap---
>
> Obviously the socket on that compute machine is not working, because it’s on an
> NFS share /var/guix belonging to master. But why is the socket considered in
> the first place?

This is a limitation in ‘build-aux/build-self.scm’:

      ;; Use the port beneath the current store as the stdin of BUILD.  This
      ;; way, we know 'open-pipe*' will not close it on 'exec'.  If PORT is
      ;; not a file port (e.g., it's an SSH channel), then the subprocess's
      ;; stdin will actually be /dev/null.
      (let* ((pipe   (with-input-from-port port
                       (lambda ()
                          ;; …
                                      (if (file-port? port)  ;<- here
                                          (number->string
                                           (logior major minor))
                                          "none"))))))

We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
variable through, along these lines:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1124 bytes --]

diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm
index f2e785b7f1..18a78b5f41 100644
--- a/build-aux/build-self.scm
+++ b/build-aux/build-self.scm
@@ -400,6 +400,7 @@ files."
                                              #:pull-version pull-version))
                       (system (if system (return system) (current-system)))
                       (home -> (getenv "HOME"))
+                      (daemon-socket -> (getenv "GUIX_DAEMON_SOCKET"))
 
                       ;; Note: Use the deprecated names here because the
                       ;; caller might be Guix <= 0.16.0.
@@ -424,6 +425,8 @@ files."
                           (when home
                             ;; Inherit HOME so that 'xdg-directory' works.
                             (setenv "HOME" home))
+                          (when (and (not (file-port? port) daemon-socket))
+                            (setenv "GUIX_DAEMON_SOCKET" daemon-socket))
                           (open-pipe* OPEN_READ
                                       (derivation->output-path build)
                                       source system version

[-- Attachment #3: Type: text/plain, Size: 280 bytes --]


It’s a bit hacky though, and won’t work with old Guix revisions anyway.

However, for your use case, you could perhaps simply pull on one machine
and use ‘guix copy’ to send Guix elsewhere?  Or even explicitly run
‘guix pull’ on each node?

Thanks,
Ludo’.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-05 17:20 ` Ludovic Courtès
@ 2020-03-06  7:40   ` Lars-Dominik Braun
  2020-03-06 10:53     ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Lars-Dominik Braun @ 2020-03-06  7:40 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39925

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

Hi Ludo,

> This is a limitation in ‘build-aux/build-self.scm’: […]
I don’t understand what’s going on there unfortunately. Is there a high-level
explanation somewhere in the manual?

> We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
> variable through, along these lines:
Nope, that does not seem to be enough. After pulling on master doing the same
on a node (with a patched guix) yields:

---snip---
ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "ssh://master.<domain>" errno: 95] 7f0f325f77b0>)'.
---snap---

Any ideas?

> +                          (when (and (not (file-port? port) daemon-socket))
(when (and (not (file-port? port)) daemon-socket)
I assume:                        ↑

> […] and won’t work with old Guix revisions anyway.
That means `guix time-machine` could not go back beyond a commit that fixes the
issue, correct? Not a concern for me.

> However, for your use case, you could perhaps simply pull on one machine
> and use ‘guix copy’ to send Guix elsewhere?
The store is the same on all machines, since /gnu/store, /var/guix and /home
are all shared via NFS. As far as I understand the manual `guix copy` would be
useful for store to store transfers on different machines only.

Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-06  7:40   ` Lars-Dominik Braun
@ 2020-03-06 10:53     ` Ludovic Courtès
  2020-03-06 11:45       ` Lars-Dominik Braun
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2020-03-06 10:53 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 39925

Hello,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> This is a limitation in ‘build-aux/build-self.scm’: […]
> I don’t understand what’s going on there unfortunately. Is there a high-level
> explanation somewhere in the manual?
>
>> We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
>> variable through, along these lines:
> Nope, that does not seem to be enough. After pulling on master doing the same
> on a node (with a patched guix) yields:
>
> ---snip---
> ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "ssh://master.<domain>" errno: 95] 7f0f325f77b0>)'.
> ---snap---
>
> Any ideas?

Sounds like this ssh URI is not valid on the nodes, is that right?

>> +                          (when (and (not (file-port? port) daemon-socket))
> (when (and (not (file-port? port)) daemon-socket)
> I assume:                        ↑
>
>> […] and won’t work with old Guix revisions anyway.
> That means `guix time-machine` could not go back beyond a commit that fixes the
> issue, correct? Not a concern for me.

Correct.

>> However, for your use case, you could perhaps simply pull on one machine
>> and use ‘guix copy’ to send Guix elsewhere?
> The store is the same on all machines, since /gnu/store, /var/guix and /home
> are all shared via NFS. As far as I understand the manual `guix copy` would be
> useful for store to store transfers on different machines only.

Right.  So perhaps I don’t quite understand the use case.  What about
simply pulling from one of these machines, if everything is shared over
NFS?

HTH,
Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-06 10:53     ` Ludovic Courtès
@ 2020-03-06 11:45       ` Lars-Dominik Braun
  2020-03-08 11:40         ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Lars-Dominik Braun @ 2020-03-06 11:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39925

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Hi,

> Sounds like this ssh URI is not valid on the nodes, is that right?
I would consider it valid, since `ssh master.<domain>` and `guix build
<package>` both work just fine from the nodes. It’s just `guix pull`, which is
causing issues.

> Right.  So perhaps I don’t quite understand the use case.  What about
> simply pulling from one of these machines, if everything is shared over
> NFS?
Sure, that’s an option, but anyone who tries will get a strange error message.
And it breaks the appeal of having a remote guix daemon in the first place,
that is being able to run `guix <whatever>` on any machine I log into. If that
is not the case (i.e. not for `guix pull`) it would be more consistent to ask
users to SSH into a different machine every time they interact with guix. Does
that explain my use case?

Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-06 11:45       ` Lars-Dominik Braun
@ 2020-03-08 11:40         ` Ludovic Courtès
  2020-03-09  8:22           ` Lars-Dominik Braun
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2020-03-08 11:40 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 39925

Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> Sounds like this ssh URI is not valid on the nodes, is that right?
> I would consider it valid, since `ssh master.<domain>` and `guix build
> <package>` both work just fine from the nodes. It’s just `guix pull`, which is
> causing issues.

Oh it may be that we would also need to let ‘HOME’ through, so that
~/.ssh/config is found, for example.  That could have undesirable side
effects that are best avoided, though (e.g., ~/.cache/guile would become
visible.)

>> Right.  So perhaps I don’t quite understand the use case.  What about
>> simply pulling from one of these machines, if everything is shared over
>> NFS?
> Sure, that’s an option, but anyone who tries will get a strange error message.

I agree that the error message is sub-optimal.  Not sure how to improve
on it (how can ‘build-self.scm’ know that it’s failing because of
that?).

> And it breaks the appeal of having a remote guix daemon in the first place,
> that is being able to run `guix <whatever>` on any machine I log into. If that
> is not the case (i.e. not for `guix pull`) it would be more consistent to ask
> users to SSH into a different machine every time they interact with guix. Does
> that explain my use case?

Instead of:

  GUIX_DAEMON_SOCKET=ssh://host guix pull

You could run:

  ssh host guix pull

In fact, the former would probably not work because ‘guix pull’ modifies
the local /var/guix/profiles, not the one on the host that runs the
daemon.

So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
powerful as you thought.  :-)  It’s really just a way to talk to a remote
daemon, but ‘guix pull’, ‘guix package’, etc. also need to access
/var/guix/profiles.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-08 11:40         ` Ludovic Courtès
@ 2020-03-09  8:22           ` Lars-Dominik Braun
  2020-03-09 10:46             ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Lars-Dominik Braun @ 2020-03-09  8:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39925

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

Hi Ludo,

> Oh it may be that we would also need to let ‘HOME’ through, so that
> ~/.ssh/config is found, for example.  That could have undesirable side
> effects that are best avoided, though (e.g., ~/.cache/guile would become
> visible.)
shouldn’t be a problem since ~/.ssh/config does not exist for that user and
known hosts are globally declared in /etc/ssh/ssh_known_hosts (strace indicates
that guile-ssh/libssh reads that file).

> I agree that the error message is sub-optimal.  Not sure how to improve
> on it (how can ‘build-self.scm’ know that it’s failing because of
> that?).
If I stop the daemon and `guix pull` it just says “guix pull: error: failed to
connect to `/var/guix/daemon-socket/socket': Connection refused”. Something
similar should do. I don’t know whether that’s possible though.

> You could run:
>   ssh host guix pull
Sure, that’s the only workaround I can think of right now.

> In fact, the former would probably not work because ‘guix pull’ modifies
> the local /var/guix/profiles, not the one on the host that runs the
> daemon.
Yes, /var/guix is shared via NFS too. Otherwise roaming between machines
wouldn’t work at all.

> So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
> powerful as you thought.  :-)
It is, it’s just a bug we have to fix :) Can I help you debug this somehow,
i.e. figure out where exactly the error message is coming from?

Cheers,
Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-09  8:22           ` Lars-Dominik Braun
@ 2020-03-09 10:46             ` Ludovic Courtès
  2020-03-10  7:19               ` Lars-Dominik Braun
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2020-03-09 10:46 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 39925

Hi!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> In fact, the former would probably not work because ‘guix pull’ modifies
>> the local /var/guix/profiles, not the one on the host that runs the
>> daemon.
> Yes, /var/guix is shared via NFS too. Otherwise roaming between machines
> wouldn’t work at all.
>
>> So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
>> powerful as you thought.  :-)
> It is, it’s just a bug we have to fix :) Can I help you debug this somehow,
> i.e. figure out where exactly the error message is coming from?

Well, I think you’re really asking for a new feature; we need more than
just talk to a remote daemon.

Updating profiles like ‘guix package’ and ‘guix pull’ do involve two
things:

  1. building the profile—this is done by talking to the daemon;

  2. modifying things in /var/guix/profiles & co.

GUIX_DAEMON_SOCKET addresses #1 but not #2.

For #2, we would need to do something like Jakub did in (guix scripts
system reconfigure), where the effectul bits can be transparently
evaluated either locally or remotely.

But really, that’d be a brand new feature, so I’m marking it as a
wishlist if you don’t mind.  :-)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#39925: `guix pull` failure in multi-machine setup
  2020-03-09 10:46             ` Ludovic Courtès
@ 2020-03-10  7:19               ` Lars-Dominik Braun
  0 siblings, 0 replies; 9+ messages in thread
From: Lars-Dominik Braun @ 2020-03-10  7:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39925

[-- Attachment #1: Type: text/plain, Size: 468 bytes --]

Hey Ludo,

> For #2, we would need to do something like Jakub did in (guix scripts
> system reconfigure), where the effectul bits can be transparently
> evaluated either locally or remotely.
I don’t understand why #2 needs different mechanics. As I said, /var/guix is
mounted r/w on every machine and in fact `guix package -i` is working as
intended.

Maybe we’ve got a communication issue here and we’re talking about two
different things?

Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-10  7:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05 13:33 bug#39925: `guix pull` failure in multi-machine setup Lars-Dominik Braun
2020-03-05 17:20 ` Ludovic Courtès
2020-03-06  7:40   ` Lars-Dominik Braun
2020-03-06 10:53     ` Ludovic Courtès
2020-03-06 11:45       ` Lars-Dominik Braun
2020-03-08 11:40         ` Ludovic Courtès
2020-03-09  8:22           ` Lars-Dominik Braun
2020-03-09 10:46             ` Ludovic Courtès
2020-03-10  7:19               ` Lars-Dominik Braun

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).