all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#61646: Bandwidth-induced offload timeout abort whole operating
@ 2023-02-20  3:28 Maxim Cournoyer
  2023-02-23 22:26 ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Maxim Cournoyer @ 2023-02-20  3:28 UTC (permalink / raw)
  To: 61646

Hi Guix,

I can reproduce this rather easily on my system:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build icedove
The following derivations will be built:
  /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
  /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
  /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
normalized load on machine 'localhost' is 0.08
building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
normalized load on machine 'localhost' is 0.08
building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
guix offload: sending 1 store item (558 MiB) to 'localhost'...
exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
cannot build derivation `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1 dependencies couldn't be built
guix build: error: build of
  `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
--8<---------------cut here---------------end--------------->8---

The third derivation tries to get a build slot and times out, because
the first two have already saturated the bandwidth of the link and it
takes more time than expected to get a reply.

The workaround is to use '-k', for "--keep-continuing", and retry the
3rd failing derivation after the first two completed.

I don't have a clear idea on how to improve the situation other than use
longer timeouts... but perhaps these timeouts could be dynamic based on
the load of the network/CPU/ ?

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#61646: Bandwidth-induced offload timeout abort whole operating
  2023-02-20  3:28 bug#61646: Bandwidth-induced offload timeout abort whole operating Maxim Cournoyer
@ 2023-02-23 22:26 ` Ludovic Courtès
  2023-02-25  2:46   ` Maxim Cournoyer
  2023-02-25  3:07   ` Maxim Cournoyer
  0 siblings, 2 replies; 4+ messages in thread
From: Ludovic Courtès @ 2023-02-23 22:26 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 61646

[-- Attachment #1: Type: text/plain, Size: 1637 bytes --]

Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I can reproduce this rather easily on my system:
>
> $ ./pre-inst-env guix build icedove
> The following derivations will be built:
>   /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
>   /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
>   /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
> process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
> normalized load on machine 'localhost' is 0.08
> building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
> process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
> normalized load on machine 'localhost' is 0.08
> building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
> guix offload: sending 1 store item (558 MiB) to 'localhost'...
> exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
> guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
> cannot build derivation `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1 dependencies couldn't be built
> guix build: error: build of
>   `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
>
> The third derivation tries to get a build slot and times out, because
> the first two have already saturated the bandwidth of the link and it
> takes more time than expected to get a reply.

Weird.  Since the it’s a timeout while connecting, I suppose the patch
below would improve the situation:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 748 bytes --]

diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
index 578b3b9888..90cf97401c 100644
--- a/guix/scripts/offload.scm
+++ b/guix/scripts/offload.scm
@@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
         (session (make-session #:user (build-machine-user machine)
                                #:host (build-machine-name machine)
                                #:port (build-machine-port machine)
-                               #:timeout 10       ;initial timeout (seconds)
+                               #:timeout 30       ;initial timeout (seconds)
                                ;; #:log-verbosity 'protocol
                                #:identity (build-machine-private-key machine)
 

[-- Attachment #3: Type: text/plain, Size: 21 bytes --]


WDYT?

Ludo’.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#61646: Bandwidth-induced offload timeout abort whole operating
  2023-02-23 22:26 ` Ludovic Courtès
@ 2023-02-25  2:46   ` Maxim Cournoyer
  2023-02-25  3:07   ` Maxim Cournoyer
  1 sibling, 0 replies; 4+ messages in thread
From: Maxim Cournoyer @ 2023-02-25  2:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 61646

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I can reproduce this rather easily on my system:
>>
>> $ ./pre-inst-env guix build icedove
>> The following derivations will be built:
>>   /gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv
>>   /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv
>>   /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv
>> process 19542 acquired build slot '/var/guix/offload/localhost:6666/0'
>> normalized load on machine 'localhost' is 0.08
>> building /gnu/store/8zi808086b3vlfjrhdm87fgljziwdqx2-icedove-l10n-102.7.2.drv...
>> process 19548 acquired build slot '/var/guix/offload/localhost:6666/1'
>> normalized load on machine 'localhost' is 0.08
>> building /gnu/store/v0sq7rb8fk36kjasb27a71z1a27wxb1s-icedove-minimal-102.7.2.drv...
>> guix offload: sending 1 store item (558 MiB) to 'localhost'...
>> exporting path `/gnu/store/bwb5hcdyzgq16kmbsva7ax0zq6lzg78z-icedove-102.7.2.tar.xz'
>> guix offload: error: failed to connect to 'localhost': Timeout connecting to localhost
>> cannot build derivation
>> `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv': 1
>> dependencies couldn't be built
>> guix build: error: build of
>>   `/gnu/store/l6r93asndd0kwv7024iyrl71zd0lbpbq-icedove-102.7.2.drv' failed
>>
>> The third derivation tries to get a build slot and times out, because
>> the first two have already saturated the bandwidth of the link and it
>> takes more time than expected to get a reply.
>
> Weird.  Since the it’s a timeout while connecting, I suppose the patch
> below would improve the situation:
>
> diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
> index 578b3b9888..90cf97401c 100644
> --- a/guix/scripts/offload.scm
> +++ b/guix/scripts/offload.scm
> @@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
>          (session (make-session #:user (build-machine-user machine)
>                                 #:host (build-machine-name machine)
>                                 #:port (build-machine-port machine)
> -                               #:timeout 10       ;initial timeout (seconds)
> +                               #:timeout 30       ;initial timeout (seconds)
>                                 ;; #:log-verbosity 'protocol
>                                 #:identity (build-machine-private-key machine)

Hm, how can I test this again?

I tried launching a daemon both on the remote and locally, with
something like:

sudo -E ./pre-inst-env ./guix-daemon --build-users-group guixbuild
--max-silent-time 0 --timeout 0 --log-compression none --discover=yes
--substitute-urls "https://ci.guix.gnu.org
https://bordeaux.guix.gnu.org" --max-jobs=20

and the code edited doesn't seem to run (I put an (error 'hello) in
there and nothing happened).

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#61646: Bandwidth-induced offload timeout abort whole operating
  2023-02-23 22:26 ` Ludovic Courtès
  2023-02-25  2:46   ` Maxim Cournoyer
@ 2023-02-25  3:07   ` Maxim Cournoyer
  1 sibling, 0 replies; 4+ messages in thread
From: Maxim Cournoyer @ 2023-02-25  3:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 61646-done

Hello,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> Weird.  Since the it’s a timeout while connecting, I suppose the patch
> below would improve the situation:
>
> diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
> index 578b3b9888..90cf97401c 100644
> --- a/guix/scripts/offload.scm
> +++ b/guix/scripts/offload.scm
> @@ -220,7 +220,7 @@ (define* (open-ssh-session machine #:optional max-silent-time)
>          (session (make-session #:user (build-machine-user machine)
>                                 #:host (build-machine-name machine)
>                                 #:port (build-machine-port machine)
> -                               #:timeout 10       ;initial timeout (seconds)
> +                               #:timeout 30       ;initial timeout (seconds)
>                                 ;; #:log-verbosity 'protocol
>                                 #:identity (build-machine-private-key machine)

Nevermind my previous message, it was --sysconfdir that had not been
set, thus ignoring my offload setup (/etc/guix/machines.scm).  The
command worked to test the change from the local machine:

--8<---------------cut here---------------start------------->8---
sudo -E ./pre-inst-env ./guix-daemon --build-users-group guixbuild \
 --max-silent-time 0 --timeout 0 --log-compression none --discover=yes \
 --substitute-urls "https://ci.guix.gnu.org https://bordeaux.guix.gnu.org" \
 --max-jobs=4
--8<---------------cut here---------------end--------------->8---

I pushed the fix in commit 53d718f61b.

Closing, thank you!

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-25  3:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-20  3:28 bug#61646: Bandwidth-induced offload timeout abort whole operating Maxim Cournoyer
2023-02-23 22:26 ` Ludovic Courtès
2023-02-25  2:46   ` Maxim Cournoyer
2023-02-25  3:07   ` Maxim Cournoyer

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.