unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#36754: New linux-libre failed to build on armhf on Berlin
@ 2019-07-21 23:56 Mark H Weaver
  2019-07-22 16:10 ` Ricardo Wurmus
  2019-07-22 17:13 ` Mark H Weaver
  0 siblings, 2 replies; 17+ messages in thread
From: Mark H Weaver @ 2019-07-21 23:56 UTC (permalink / raw)
  To: 36754

In commit 1ad9c105c208caa9059924cbfbe4759c8101f6c9, I changed our
linux-libre packages to deblob the linux-libre source tarballs
ourselves, i.e. to run the deblobbing scripts provided by the
linux-libre project to produce linux-libre source tarballs from the
upstream linux tarballs:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=1ad9c105c208caa9059924cbfbe4759c8101f6c9

The following queries show that the updated packages built successfully
on x86_64, i686, and aarch64, but they all failed on armhf:

  https://ci.guix.gnu.org/search?query=linux-libre-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-4.14.134
  https://ci.guix.gnu.org/search?query=linux-libre-4.9.186
  https://ci.guix.gnu.org/search?query=linux-libre-4.4.186
  https://ci.guix.gnu.org/search?query=linux-libre-arm-veyron-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-4.14.134
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-4.14.134

Unfortunately, I'm unable to get *any* information about what went wrong
from Cuirass.  None of the failed builds have associated log files, and
the build details page has no useful information either.  For example:

  https://ci.guix.gnu.org/build/1488517/details

My first guess was that something went wrong in the 'computed' origin
that runs the deblobbing script.  However, that's apparently not the
case, because all of the updated 'linux-libre-headers' packages built
successfully on armhf, and those use the same source tarballs as the
main 'linux-libre' packages.

  https://ci.guix.gnu.org/search?query=linux-libre-headers-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-headers-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-headers-4.14.134

Can someone help me find out what's going on here?  Until then, I'm
sorry to say that armhf-linux users will be unable to update their
systems.

       Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: New linux-libre failed to build on armhf on Berlin
  2019-07-21 23:56 bug#36754: New linux-libre failed to build on armhf on Berlin Mark H Weaver
@ 2019-07-22 16:10 ` Ricardo Wurmus
  2019-07-22 17:13 ` Mark H Weaver
  1 sibling, 0 replies; 17+ messages in thread
From: Ricardo Wurmus @ 2019-07-22 16:10 UTC (permalink / raw)
  To: mhw; +Cc: 36754


Mark H Weaver <mhw@netris.org> writes:

> Unfortunately, I'm unable to get *any* information about what went wrong
> from Cuirass.  None of the failed builds have associated log files, and
> the build details page has no useful information either.  For example:
>
>   https://ci.guix.gnu.org/build/1488517/details

On that page I see a link to the build log, but it appears to be
truncated:

    https://ci.guix.gnu.org/log/33hv7mij9bqqgf5hqwrw14106z9zgav9-linux-libre-5.2.2

Maybe the build node died before the build could be completed?

-- 
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: New linux-libre failed to build on armhf on Berlin
  2019-07-21 23:56 bug#36754: New linux-libre failed to build on armhf on Berlin Mark H Weaver
  2019-07-22 16:10 ` Ricardo Wurmus
@ 2019-07-22 17:13 ` Mark H Weaver
  2019-07-23 16:46   ` Marius Bakke
  1 sibling, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2019-07-22 17:13 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 36754

Hi Ricardo,

Interesting.  I distinctly remember that there was no log file when I
looked last time.  Hmm.

Anyway, it seems that now, all of the failed builds have either build
logs available or else information about which dependency failed.  I
don't remember seeing any of this last time, but I'm glad to see it now.

A pattern has now emerged, but I don't know what it means.  All of the
armhf kernel builds failed except for linux-libre-arm-veyron-5.2.2,
which succeeded:

  https://ci.guix.gnu.org/build/1488502/details  (arm-veyron-5.2.2)

Apart from this anomalous success, all of the armhf 5.2.2 and 4.19.60
have a truncated log file:

  https://ci.guix.gnu.org/build/1488517/details  (5.2.2)
  https://ci.guix.gnu.org/build/1488503/details  (4.19.60)
  https://ci.guix.gnu.org/build/1488513/details  (arm-generic-5.2.2)
  https://ci.guix.gnu.org/build/1488519/details  (arm-generic-4.19.60)
  https://ci.guix.gnu.org/build/1488504/details  (arm-omap2plus-5.2.2)
  https://ci.guix.gnu.org/build/1488501/details  (arm-omap2plus-4.19.60)

This pattern seems too regular to be a coincidence.  Can we find out
which build machines were used for these builds?

All of the 4.14.134 builds failed in the deblobbing step, due to timeout
(1 hour of silence) while packing the linux-libre tarball:

  https://ci.guix.gnu.org/build/1488514/details  (4.14.134)
  https://ci.guix.gnu.org/build/1488515/details  (arm-generic-4.14.134)
  https://ci.guix.gnu.org/build/1488512/details  (arm-omap2plus-4.14.134)

I'm not sure how to deal with this.  This is a computed origin, not a
normal package, and so I don't see a way to configure a longer timeout.

Perhaps I should make the tarball packing and unpacking operations
verbose, to work around the issue.  Of course that's our usual practice,
but I find it suboptimal because any warnings will be buried in a
mountain of uninteresting output.

Thoughts?  Anyway, thanks for looking into it.

       Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: New linux-libre failed to build on armhf on Berlin
  2019-07-22 17:13 ` Mark H Weaver
@ 2019-07-23 16:46   ` Marius Bakke
  2019-07-23 17:33     ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds (was: New linux-libre failed to build on armhf on Berlin) Mark H Weaver
  0 siblings, 1 reply; 17+ messages in thread
From: Marius Bakke @ 2019-07-23 16:46 UTC (permalink / raw)
  To: Mark H Weaver, Ricardo Wurmus; +Cc: 36754

Mark H Weaver <mhw@netris.org> writes:

> Hi Ricardo,
>
> Interesting.  I distinctly remember that there was no log file when I
> looked last time.  Hmm.
>
> Anyway, it seems that now, all of the failed builds have either build
> logs available or else information about which dependency failed.  I
> don't remember seeing any of this last time, but I'm glad to see it now.
>
> A pattern has now emerged, but I don't know what it means.  All of the
> armhf kernel builds failed except for linux-libre-arm-veyron-5.2.2,
> which succeeded:
>
>   https://ci.guix.gnu.org/build/1488502/details  (arm-veyron-5.2.2)
>
> Apart from this anomalous success, all of the armhf 5.2.2 and 4.19.60
> have a truncated log file:
>
>   https://ci.guix.gnu.org/build/1488517/details  (5.2.2)
>   https://ci.guix.gnu.org/build/1488503/details  (4.19.60)
>   https://ci.guix.gnu.org/build/1488513/details  (arm-generic-5.2.2)
>   https://ci.guix.gnu.org/build/1488519/details  (arm-generic-4.19.60)
>   https://ci.guix.gnu.org/build/1488504/details  (arm-omap2plus-5.2.2)
>   https://ci.guix.gnu.org/build/1488501/details  (arm-omap2plus-4.19.60)
>
> This pattern seems too regular to be a coincidence.  Can we find out
> which build machines were used for these builds?

I tried building 5.2.2 'interactively' on Berlin, and got an SSH error:

  CC [M]  net/openvswitch/vport-geneve.o
  CC [M]  net/openvswitch/vport-gre.o
  LD [M]  net/openvswitch/openvswitch.o
;;; [2019/07/23 05:14:53.501502, 0] read_from_channel_port: [GSSH ERROR] Error reading from the channel: #<input-output: channel (closed) 14c0e60>
Backtrace:
          16 (apply-smob/1 #<catch-closure b79640>)
In ice-9/boot-9.scm:
    705:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8 14 (_ #(#(#<directory (guile-user) bfb140>)))
In guix/ui.scm:
  1747:12 13 (run-guix-command _ . _)
In guix/scripts/offload.scm:
   781:22 12 (guix-offload . _)
In ice-9/boot-9.scm:
    829:9 11 (catch _ _ #<procedure 7f576678d910 at guix/ui.scm:703…> …)
    829:9 10 (catch _ _ #<procedure 7f576678d928 at guix/ui.scm:826…> …)
In guix/scripts/offload.scm:
   580:19  9 (process-request _ _ _ _ #:print-build-trace? _ # _ # _)
    531:6  8 (call-with-timeout _ _ _)
    361:2  7 (transfer-and-offload #<derivation /gnu/store/yfns7ga4…> …)
In ice-9/boot-9.scm:
    829:9  6 (catch _ _ #<procedure dbdab0 at guix/scripts/offload.…> …)
In guix/scripts/offload.scm:
    385:6  5 (_)
In guix/store.scm:
  1203:15  4 (_ #<store-connection 256.99 19a0ba0> _ _)
   692:11  3 (process-stderr #<store-connection 256.99 19a0ba0> _)
In guix/serialization.scm:
    87:11  2 (read-int _)
    73:12  1 (get-bytevector-n* #<input-output: channel (closed) 14…> …)
In unknown file:
           0 (get-bytevector-n #<input-output: channel (closed) 14c…> …)

ERROR: In procedure get-bytevector-n:
Throw to key `guile-ssh-error' with args `("read_from_channel_port" "Error reading from the channel" #<input-output: channel (closed) 14c0e60> #f)'.
guix build: error: build of `/gnu/store/yfns7ga468vmv9jn72snk79b16p8mhfa-linux-libre-5.2.2.drv' failed

real    637m24.906s
user    0m6.661s
sys     0m0.897s

Unfortunately I failed to record which machine was used and don't know a
way to find out after the fact.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds (was: New linux-libre failed to build on armhf on Berlin)
  2019-07-23 16:46   ` Marius Bakke
@ 2019-07-23 17:33     ` Mark H Weaver
  2019-07-23 17:49       ` Mark H Weaver
  0 siblings, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2019-07-23 17:33 UTC (permalink / raw)
  To: Marius Bakke; +Cc: 36754

retitle 36754 SSH connections to hydra-slave{1,2,3} fail during builds
thanks

Hi,

I've added Ludovic to the CC list, since he recently added
hydra-slave{1,2,3} to Berlin.

Marius wrote:
> I tried building 5.2.2 'interactively' on Berlin, and got an SSH error:
> 
>   CC [M]  net/openvswitch/vport-geneve.o
>   CC [M]  net/openvswitch/vport-gre.o
>   LD [M]  net/openvswitch/openvswitch.o
> ;;; [2019/07/23 05:14:53.501502, 0] read_from_channel_port: [GSSH ERROR] Error reading from the channel: #<input-output: channel (closed) 14c0e60>
> Backtrace:
>           16 (apply-smob/1 #<catch-closure b79640>)
> In ice-9/boot-9.scm:
>     705:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
> In ice-9/eval.scm:
>     619:8 14 (_ #(#(#<directory (guile-user) bfb140>)))
> In guix/ui.scm:
>   1747:12 13 (run-guix-command _ . _)
> In guix/scripts/offload.scm:
>    781:22 12 (guix-offload . _)
> In ice-9/boot-9.scm:
>     829:9 11 (catch _ _ #<procedure 7f576678d910 at guix/ui.scm:703…> …)
>     829:9 10 (catch _ _ #<procedure 7f576678d928 at guix/ui.scm:826…> …)
> In guix/scripts/offload.scm:
>    580:19  9 (process-request _ _ _ _ #:print-build-trace? _ # _ # _)
>     531:6  8 (call-with-timeout _ _ _)
>     361:2  7 (transfer-and-offload #<derivation /gnu/store/yfns7ga4…> …)
> In ice-9/boot-9.scm:
>     829:9  6 (catch _ _ #<procedure dbdab0 at guix/scripts/offload.…> …)
> In guix/scripts/offload.scm:
>     385:6  5 (_)
> In guix/store.scm:
>   1203:15  4 (_ #<store-connection 256.99 19a0ba0> _ _)
>    692:11  3 (process-stderr #<store-connection 256.99 19a0ba0> _)
> In guix/serialization.scm:
>     87:11  2 (read-int _)
>     73:12  1 (get-bytevector-n* #<input-output: channel (closed) 14…> …)
> In unknown file:
>            0 (get-bytevector-n #<input-output: channel (closed) 14c…> …)
> 
> ERROR: In procedure get-bytevector-n:
> Throw to key `guile-ssh-error' with args `("read_from_channel_port" "Error reading from the channel" #<input-output: channel (closed) 14c0e60> #f)'.
> guix build: error: build of `/gnu/store/yfns7ga468vmv9jn72snk79b16p8mhfa-linux-libre-5.2.2.drv' failed
> 
> real    637m24.906s
> user    0m6.661s
> sys     0m0.897s

Thank you, this is helpful.

> Unfortunately I failed to record which machine was used and don't know a
> way to find out after the fact.

I believe it was hydra-slave2, one of the three armhf machines that I
host which were formerly part of hydra.gnu.org's build farm and were
recently added to Berlin by Ludovic.  I checked hydra-slave{1,2,3} for
build log files corresponding to the derivation above, and found that
all three of them have been attempted recently:

hydra-slave2 attempted to build it on July 23 08:07 UTC.
hydra-slave3 attempted to build it on July 22 16:40 UTC.
hydra-slave1 attempted to build it on July 22 04:44 UTC.

To be precise, each of those dates correspond to the end of the build
attempt.  All three build logs are truncated on the build machine as
well, with no error message at the end.

I now believe that these failures are related to the newly added armhf
build slaves, and that they have nothing to do with the recent changes
to our linux-libre packages.

Well, except for the silence timeout that sometimes happens on slower
machines while deblobbing linux-libre.  That's a separate issue.

      Thanks,
        Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds (was: New linux-libre failed to build on armhf on Berlin)
  2019-07-23 17:33     ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds (was: New linux-libre failed to build on armhf on Berlin) Mark H Weaver
@ 2019-07-23 17:49       ` Mark H Weaver
  2019-07-23 21:26         ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds Ludovic Courtès
  0 siblings, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2019-07-23 17:49 UTC (permalink / raw)
  To: Marius Bakke; +Cc: 36754

I wrote earlier:
> I now believe that these failures are related to the newly added armhf
> build slaves, and that they have nothing to do with the recent changes
> to our linux-libre packages.

I should mention that the armhf build slaves are on a private network,
and I use my public-facing internet server to forward TCP connections to
them, using the following entries in /etc/inetd.conf:

--8<---------------cut here---------------start------------->8---
# TCP-level forwards for SSH connections to build machines for the GNU
# Guix build farm:
7275    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.11 7275
7276    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.12 7276
7274    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.13 7274
--8<---------------cut here---------------end--------------->8---

It's possible that this arrangement is somehow part of the problem.
However, note that nothing has changed here in several years, and it
worked fine on hydra.gnu.org.  The build slaves were running a *very*
old version of Guix though.  It seems likely that the new Guile-SSH code
doesn't cope well with this setup.

       Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-23 17:49       ` Mark H Weaver
@ 2019-07-23 21:26         ` Ludovic Courtès
  2019-07-23 21:55           ` Ricardo Wurmus
  2019-07-24 11:09           ` Mark H Weaver
  0 siblings, 2 replies; 17+ messages in thread
From: Ludovic Courtès @ 2019-07-23 21:26 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 36754

Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

> I wrote earlier:
>> I now believe that these failures are related to the newly added armhf
>> build slaves, and that they have nothing to do with the recent changes
>> to our linux-libre packages.
>
> I should mention that the armhf build slaves are on a private network,
> and I use my public-facing internet server to forward TCP connections to
> them, using the following entries in /etc/inetd.conf:
>
> # TCP-level forwards for SSH connections to build machines for the GNU
> # Guix build farm:
> 7275    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.11 7275
> 7276    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.12 7276
> 7274    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.13 7274
>
> It's possible that this arrangement is somehow part of the problem.
> However, note that nothing has changed here in several years, and it
> worked fine on hydra.gnu.org.  The build slaves were running a *very*
> old version of Guix though.  It seems likely that the new Guile-SSH code
> doesn't cope well with this setup.

I noticed that connections to the machines were unstable (using
OpenSSH’s client).  That is, the connection would eventually “hang”,
apparently several times a day.

Currently we have an SSH tunnel set up on berlin to connect to each of
these machines via overdrive1.guixsd.org.  This setup proved to be
robust in the past (we used it to connect to another build machine), so
I suspect something’s wrong on “your” end of the network.  It’s hard to
tell exactly what, though.

Ideas?

If it’s causing build failures, I’m afraid we’ll have to comment out
those machines from berlin’s machines.scm until we’ve figured it out.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-23 21:26         ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds Ludovic Courtès
@ 2019-07-23 21:55           ` Ricardo Wurmus
  2019-08-01 15:39             ` Ricardo Wurmus
  2019-07-24 11:09           ` Mark H Weaver
  1 sibling, 1 reply; 17+ messages in thread
From: Ricardo Wurmus @ 2019-07-23 21:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36754


Ludovic Courtès <ludo@gnu.org> writes:

> Currently we have an SSH tunnel set up on berlin to connect to each of
> these machines via overdrive1.guixsd.org.  This setup proved to be
> robust in the past (we used it to connect to another build machine), so
> I suspect something’s wrong on “your” end of the network.  It’s hard to
> tell exactly what, though.

FWIW by the end of this week we should have the firewall changes
implemented so we can do without the SSH tunnel.

--
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-23 21:26         ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds Ludovic Courtès
  2019-07-23 21:55           ` Ricardo Wurmus
@ 2019-07-24 11:09           ` Mark H Weaver
  2019-07-24 14:56             ` Ludovic Courtès
  1 sibling, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2019-07-24 11:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36754

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> wrote:
> I noticed that connections to the machines were unstable (using
> OpenSSH’s client).  That is, the connection would eventually “hang”,
> apparently several times a day.
>
> Currently we have an SSH tunnel set up on berlin to connect to each of
> these machines via overdrive1.guixsd.org.  This setup proved to be
> robust in the past (we used it to connect to another build machine), so
> I suspect something’s wrong on “your” end of the network.  It’s hard to
> tell exactly what, though.
>
> Ideas?

Okay, I'll look into it.  I'm very busy with something else for the next
couple of days, but I'll try to get to it in the next week.

> If it’s causing build failures, I’m afraid we’ll have to comment out
> those machines from berlin’s machines.scm until we’ve figured it out.

Agreed.

     Thanks,
       Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-24 11:09           ` Mark H Weaver
@ 2019-07-24 14:56             ` Ludovic Courtès
  2019-08-01 14:09               ` Marius Bakke
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2019-07-24 14:56 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 36754

Hello,

Mark H Weaver <mhw@netris.org> skribis:

> Ludovic Courtès <ludo@gnu.org> wrote:
>> I noticed that connections to the machines were unstable (using
>> OpenSSH’s client).  That is, the connection would eventually “hang”,
>> apparently several times a day.
>>
>> Currently we have an SSH tunnel set up on berlin to connect to each of
>> these machines via overdrive1.guixsd.org.  This setup proved to be
>> robust in the past (we used it to connect to another build machine), so
>> I suspect something’s wrong on “your” end of the network.  It’s hard to
>> tell exactly what, though.
>>
>> Ideas?
>
> Okay, I'll look into it.  I'm very busy with something else for the next
> couple of days, but I'll try to get to it in the next week.

OK!

>> If it’s causing build failures, I’m afraid we’ll have to comment out
>> those machines from berlin’s machines.scm until we’ve figured it out.
>
> Agreed.

I’ve commented them out now.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-24 14:56             ` Ludovic Courtès
@ 2019-08-01 14:09               ` Marius Bakke
  2019-08-01 16:37                 ` Mark H Weaver
  0 siblings, 1 reply; 17+ messages in thread
From: Marius Bakke @ 2019-08-01 14:09 UTC (permalink / raw)
  To: Ludovic Courtès, Mark H Weaver; +Cc: 36754

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

The truncated log files seems to happen for other builds as well, even
within the Berlin data center.

https://ci.guix.gnu.org/log/n3ra1b8ic6qhfinnhb80mrn7snsqws9d-geocode-glib-3.26.0
https://ci.guix.gnu.org/log/zqhqlib00i8f7f10g4c2dfzprw16h4xv-scintilla-4.2.0
https://ci.guix.gnu.org/log/718jmbq94mvdgnmjyqgxgy7zaj8xzxk3-htslib-1.9

All of these builds are for i686-linux.

Mark: are the armhf nodes still operational?  I would like to re-enable
them again, since we desperately need the computing power with four huge
branches going concurrently at the moment.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-07-23 21:55           ` Ricardo Wurmus
@ 2019-08-01 15:39             ` Ricardo Wurmus
  0 siblings, 0 replies; 17+ messages in thread
From: Ricardo Wurmus @ 2019-08-01 15:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 36754


Ricardo Wurmus <rekado@elephly.net> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Currently we have an SSH tunnel set up on berlin to connect to each of
>> these machines via overdrive1.guixsd.org.  This setup proved to be
>> robust in the past (we used it to connect to another build machine), so
>> I suspect something’s wrong on “your” end of the network.  It’s hard to
>> tell exactly what, though.
>
> FWIW by the end of this week we should have the firewall changes
> implemented so we can do without the SSH tunnel.

The firewall changes have been applied today.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-08-01 14:09               ` Marius Bakke
@ 2019-08-01 16:37                 ` Mark H Weaver
  2019-08-01 21:06                   ` Ricardo Wurmus
  0 siblings, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2019-08-01 16:37 UTC (permalink / raw)
  To: Marius Bakke; +Cc: 36754

Hi Marius,

Marius Bakke <mbakke@fastmail.com> wrote:

> The truncated log files seems to happen for other builds as well, even
> within the Berlin data center.
>
> https://ci.guix.gnu.org/log/n3ra1b8ic6qhfinnhb80mrn7snsqws9d-geocode-glib-3.26.0
> https://ci.guix.gnu.org/log/zqhqlib00i8f7f10g4c2dfzprw16h4xv-scintilla-4.2.0
> https://ci.guix.gnu.org/log/718jmbq94mvdgnmjyqgxgy7zaj8xzxk3-htslib-1.9
>
> All of these builds are for i686-linux.

Thanks, that's very useful information.

> Mark: are the armhf nodes still operational?

I assume so.  They all respond to pings anyway, and I haven't touched
them since before they were disconnected from Berlin.  (I would need to
boot up my other, more secure computer to try SSHing into them).

> I would like to re-enable them again, since we desperately need the
> computing power with four huge branches going concurrently at the
> moment.

I have no objection, but since Ludovic made the decision to disconnect
them, it would be good to hear from him first.

       Thanks,
         Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-08-01 16:37                 ` Mark H Weaver
@ 2019-08-01 21:06                   ` Ricardo Wurmus
  2019-08-07 14:30                     ` Ricardo Wurmus
  0 siblings, 1 reply; 17+ messages in thread
From: Ricardo Wurmus @ 2019-08-01 21:06 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 36754


Mark H Weaver <mhw@netris.org> writes:

>> Mark: are the armhf nodes still operational?
>
> I assume so.  They all respond to pings anyway, and I haven't touched
> them since before they were disconnected from Berlin.  (I would need to
> boot up my other, more secure computer to try SSHing into them).
>
>> I would like to re-enable them again, since we desperately need the
>> computing power with four huge branches going concurrently at the
>> moment.
>
> I have no objection, but since Ludovic made the decision to disconnect
> them, it would be good to hear from him first.

Now that we should be able to SSH to them directly from Berlin we can
try connecting and perhaps upgrading the guix-daemon on these machines.

--
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-08-01 21:06                   ` Ricardo Wurmus
@ 2019-08-07 14:30                     ` Ricardo Wurmus
  2019-08-16 10:25                       ` Ludovic Courtès
  0 siblings, 1 reply; 17+ messages in thread
From: Ricardo Wurmus @ 2019-08-07 14:30 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 36754


Ricardo Wurmus <rekado@elephly.net> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>>> Mark: are the armhf nodes still operational?
>>
>> I assume so.  They all respond to pings anyway, and I haven't touched
>> them since before they were disconnected from Berlin.  (I would need to
>> boot up my other, more secure computer to try SSHing into them).
>>
>>> I would like to re-enable them again, since we desperately need the
>>> computing power with four huge branches going concurrently at the
>>> moment.
>>
>> I have no objection, but since Ludovic made the decision to disconnect
>> them, it would be good to hear from him first.
>
> Now that we should be able to SSH to them directly from Berlin we can
> try connecting and perhaps upgrading the guix-daemon on these machines.

I have removed the SSH tunnel configuration from /etc/guix/machines.scm
and re-enabled the machines.

Let’s see if this makes any difference.  If not we should try to upgrade
Guix on these build machines.

--
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-08-07 14:30                     ` Ricardo Wurmus
@ 2019-08-16 10:25                       ` Ludovic Courtès
  2019-09-12  8:41                         ` Ludovic Courtès
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2019-08-16 10:25 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 36754

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> Mark H Weaver <mhw@netris.org> writes:
>>
>>>> Mark: are the armhf nodes still operational?
>>>
>>> I assume so.  They all respond to pings anyway, and I haven't touched
>>> them since before they were disconnected from Berlin.  (I would need to
>>> boot up my other, more secure computer to try SSHing into them).
>>>
>>>> I would like to re-enable them again, since we desperately need the
>>>> computing power with four huge branches going concurrently at the
>>>> moment.
>>>
>>> I have no objection, but since Ludovic made the decision to disconnect
>>> them, it would be good to hear from him first.
>>
>> Now that we should be able to SSH to them directly from Berlin we can
>> try connecting and perhaps upgrading the guix-daemon on these machines.
>
> I have removed the SSH tunnel configuration from /etc/guix/machines.scm
> and re-enabled the machines.
>
> Let’s see if this makes any difference.

Is it working well now?

> If not we should try to upgrade Guix on these build machines.

I think there’s a misunderstanding: these machines used to run a very
old Guix but I installed 1.0 from scratch before migrating them to
berlin.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds
  2019-08-16 10:25                       ` Ludovic Courtès
@ 2019-09-12  8:41                         ` Ludovic Courtès
  0 siblings, 0 replies; 17+ messages in thread
From: Ludovic Courtès @ 2019-09-12  8:41 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 36754-done

Hello,

AFAICS we no longer have connection issues to
hydra-slave{1,2,3}.netris.org so I’m closing this bug.

Ludo’.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-09-12  8:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-21 23:56 bug#36754: New linux-libre failed to build on armhf on Berlin Mark H Weaver
2019-07-22 16:10 ` Ricardo Wurmus
2019-07-22 17:13 ` Mark H Weaver
2019-07-23 16:46   ` Marius Bakke
2019-07-23 17:33     ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds (was: New linux-libre failed to build on armhf on Berlin) Mark H Weaver
2019-07-23 17:49       ` Mark H Weaver
2019-07-23 21:26         ` bug#36754: SSH connections to hydra-slave{1, 2, 3} fail during builds Ludovic Courtès
2019-07-23 21:55           ` Ricardo Wurmus
2019-08-01 15:39             ` Ricardo Wurmus
2019-07-24 11:09           ` Mark H Weaver
2019-07-24 14:56             ` Ludovic Courtès
2019-08-01 14:09               ` Marius Bakke
2019-08-01 16:37                 ` Mark H Weaver
2019-08-01 21:06                   ` Ricardo Wurmus
2019-08-07 14:30                     ` Ricardo Wurmus
2019-08-16 10:25                       ` Ludovic Courtès
2019-09-12  8:41                         ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).