all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* CI is not processing jobs
@ 2024-05-31  6:25 Lars-Dominik Braun
  2024-06-01 14:09 ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: Lars-Dominik Braun @ 2024-05-31  6:25 UTC (permalink / raw)
  To: guix-devel

Hi,

I’d like to merge the haskell-team branch. Changes are rather small
this time, but I still need CI to build the branch to be able to assess
the damage done. Unfortunately Cuirass has not been processing scheduled
jobs of this evaluation[1] for about a week now. What can we do about
that?

Thanks,
Lars

[1] https://ci.guix.gnu.org/eval/1348713



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-05-31  6:25 CI is not processing jobs Lars-Dominik Braun
@ 2024-06-01 14:09 ` Ludovic Courtès
  2024-06-02 21:14   ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2024-06-01 14:09 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: guix-devel

Hi Lars,

Lars-Dominik Braun <lars@6xq.net> skribis:

> I’d like to merge the haskell-team branch. Changes are rather small
> this time, but I still need CI to build the branch to be able to assess
> the damage done. Unfortunately Cuirass has not been processing scheduled
> jobs of this evaluation[1] for about a week now. What can we do about
> that?

I’ll deploy a fix to ‘cuirass remote-server’ in the coming days, which
should address this.

The longer story is that Ricardo noticed that the build backlog had been
growing for a couple of months (see “Pending builds”):

  https://ci.guix.gnu.org/metrics

We discussed this on guix-sysadmin and found that this was due to the
poor performance of a SQL query at the core of ‘cuirass remote-server’
that was roughly linear in the number of packages in the database.  With
help from Chris Baines, this is now fixed:

  https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=17338588d4862b04e9e405c1244a2ea703b50d98

I’ve been test-driving this and other changes on guix.bordeaux.inria.fr
for the last few days to get some confidence.  It was conclusive so I’ll
go ahead and deploy it on ci.guix hopefully this week-end.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-01 14:09 ` Ludovic Courtès
@ 2024-06-02 21:14   ` Ludovic Courtès
  2024-06-06 15:05     ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2024-06-02 21:14 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: guix-devel

Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

> The longer story is that Ricardo noticed that the build backlog had been
> growing for a couple of months (see “Pending builds”):
>
>   https://ci.guix.gnu.org/metrics
>
> We discussed this on guix-sysadmin and found that this was due to the
> poor performance of a SQL query at the core of ‘cuirass remote-server’
> that was roughly linear in the number of packages in the database.  With
> help from Chris Baines, this is now fixed:
>
>   https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=17338588d4862b04e9e405c1244a2ea703b50d98

This is now deployed on ci.guix.gnu.org.

So far all the workers are kept busy.  We’ll have to see if it keeps up
over time.

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-02 21:14   ` Ludovic Courtès
@ 2024-06-06 15:05     ` Ludovic Courtès
  2024-06-06 17:48       ` Andreas Enge
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ludovic Courtès @ 2024-06-06 15:05 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: guix-devel

Hi Guix!

Ludovic Courtès <ludo@gnu.org> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> The longer story is that Ricardo noticed that the build backlog had been
>> growing for a couple of months (see “Pending builds”):
>>
>>   https://ci.guix.gnu.org/metrics
>>
>> We discussed this on guix-sysadmin and found that this was due to the
>> poor performance of a SQL query at the core of ‘cuirass remote-server’
>> that was roughly linear in the number of packages in the database.  With
>> help from Chris Baines, this is now fixed:
>>
>>   https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=17338588d4862b04e9e405c1244a2ea703b50d98
>
> This is now deployed on ci.guix.gnu.org.

This has been running for ~4 days now; the number of pending builds has
significantly decreased (in particular, you’ll be delighted to get
substitutes for ‘core-updates’!):

  https://ci.guix.gnu.org/metrics

Almost all the x86 builds have been consumed:

--8<---------------cut here---------------start------------->8---
cuirass=> select count(*) from builds where status = -2 and system ='x86_64-linux';
 count 
-------
   748
(1 row)

cuirass=> select count(*) from builds where status = -2 and system ='i686-linux';
 count 
-------
     0
(1 row)

cuirass=> select count(*) from builds where status = -2 and system ='powerpc64le-linux';
 count  
--------
 110963
(1 row)

cuirass=> select count(*) from builds where status = -2 ;
 count  
--------
 382892
(1 row)
--8<---------------cut here---------------end--------------->8---

The “Pending builds” plot above shows we’re reaching a plateau: this is
because the 382,000+ remaining builds are non-x86_64 and we lack
resources for these platforms (Arm in particular).

We should probably investigate and cancel builds that correspond to old
evaluations or now-irrelevant jobsets.

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-06 15:05     ` Ludovic Courtès
@ 2024-06-06 17:48       ` Andreas Enge
  2024-06-12  9:47         ` Andreas Enge
  2024-06-07  6:38       ` CI is not processing jobs Lars-Dominik Braun
  2024-06-16 14:22       ` Philip McGrath
  2 siblings, 1 reply; 15+ messages in thread
From: Andreas Enge @ 2024-06-06 17:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Lars-Dominik Braun, guix-devel

Hello,

Am Thu, Jun 06, 2024 at 05:05:32PM +0200 schrieb Ludovic Courtès:
> Almost all the x86 builds have been consumed:
> 
> --8<---------------cut here---------------start------------->8---
> cuirass=> select count(*) from builds where status = -2 and system ='x86_64-linux';
>  count 
> -------
>    748
> (1 row)
> 
> cuirass=> select count(*) from builds where status = -2 and system ='i686-linux';
>  count 
> -------
>      0
> (1 row)

these are good news, and thanks for sharing the detailed package counts!
Could the graph on
   https://ci.guix.gnu.org/metrics
be augmented by the number of packages to be built for the different
architectures?

Andreas



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-06 15:05     ` Ludovic Courtès
  2024-06-06 17:48       ` Andreas Enge
@ 2024-06-07  6:38       ` Lars-Dominik Braun
  2024-06-16 14:22       ` Philip McGrath
  2 siblings, 0 replies; 15+ messages in thread
From: Lars-Dominik Braun @ 2024-06-07  6:38 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi Ludo,

> This has been running for ~4 days now; the number of pending builds has
> significantly decreased (in particular, you’ll be delighted to get
> substitutes for ‘core-updates’!):

excellent, thank you very much!

Lars



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-06 17:48       ` Andreas Enge
@ 2024-06-12  9:47         ` Andreas Enge
  2024-06-12 14:50           ` Maxim Cournoyer
  2024-06-17 12:07           ` Little progress on powerpc64le and aarch64 builds on ci.guix Ludovic Courtès
  0 siblings, 2 replies; 15+ messages in thread
From: Andreas Enge @ 2024-06-12  9:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Am Thu, Jun 06, 2024 at 07:48:27PM +0200 schrieb Andreas Enge:
> Could the graph on
>    https://ci.guix.gnu.org/metrics
> be augmented by the number of packages to be built for the different
> architectures?

In that direction, the metrics now show that very few packages were built
in the last 24 hours, except maybe for ARM (where we anyway build few
packages). But the number of waiting builds stalls at around 280000.

Are these all for ARM now? Should we cancel builds a bit more aggressively
to make sure that recent packages are favoured?

Or is there a problem yet, since our ARM machines are also essentially idle?

Andreas



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-12  9:47         ` Andreas Enge
@ 2024-06-12 14:50           ` Maxim Cournoyer
  2024-06-17 12:07           ` Little progress on powerpc64le and aarch64 builds on ci.guix Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: Maxim Cournoyer @ 2024-06-12 14:50 UTC (permalink / raw)
  To: Andreas Enge; +Cc: Ludovic Courtès, guix-devel

Hi Andreas,

Andreas Enge <andreas@enge.fr> writes:

> Am Thu, Jun 06, 2024 at 07:48:27PM +0200 schrieb Andreas Enge:
>> Could the graph on
>>    https://ci.guix.gnu.org/metrics
>> be augmented by the number of packages to be built for the different
>> architectures?
>
> In that direction, the metrics now show that very few packages were built
> in the last 24 hours, except maybe for ARM (where we anyway build few
> packages). But the number of waiting builds stalls at around 280000.
>
> Are these all for ARM now? Should we cancel builds a bit more aggressively
> to make sure that recent packages are favoured?
>
> Or is there a problem yet, since our ARM machines are also essentially idle?

I don't have any technical knowledge to bring to the topic, but I'd like
to add that it would be very useful to have the metrics already
available broken down per architecture, so we could track which ones are
lagging behind.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: CI is not processing jobs
  2024-06-06 15:05     ` Ludovic Courtès
  2024-06-06 17:48       ` Andreas Enge
  2024-06-07  6:38       ` CI is not processing jobs Lars-Dominik Braun
@ 2024-06-16 14:22       ` Philip McGrath
  2024-06-17 12:11         ` qa.guix delays in processing patches Ludovic Courtès
  2 siblings, 1 reply; 15+ messages in thread
From: Philip McGrath @ 2024-06-16 14:22 UTC (permalink / raw)
  To: Ludovic Courtès, Lars-Dominik Braun; +Cc: Brian Cully

Hi,

On Thu, Jun 6, 2024, at 11:05 AM, Ludovic Courtès wrote:
> Hi Guix!
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> The longer story is that Ricardo noticed that the build backlog had been
>>> growing for a couple of months (see “Pending builds”):
>>>
>>>   https://ci.guix.gnu.org/metrics
>>>
>>> We discussed this on guix-sysadmin and found that this was due to the
>>> poor performance of a SQL query at the core of ‘cuirass remote-server’
>>> that was roughly linear in the number of packages in the database.  With
>>> help from Chris Baines, this is now fixed:
>>>
>>>   https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=17338588d4862b04e9e405c1244a2ea703b50d98
>>
>> This is now deployed on ci.guix.gnu.org.
>
> This has been running for ~4 days now; the number of pending builds has
> significantly decreased (in particular, you’ll be delighted to get
> substitutes for ‘core-updates’!):
>
>   https://ci.guix.gnu.org/metrics
>
> Almost all the x86 builds have been consumed:
>
> ...
>
> The “Pending builds” plot above shows we’re reaching a plateau: this is
> because the 382,000+ remaining builds are non-x86_64 and we lack
> resources for these platforms (Arm in particular).
>
> We should probably investigate and cancel builds that correspond to old
> evaluations or now-irrelevant jobsets.
>

For some reason QA still doesn't seem to be working for https://issues.guix.gnu.org/71203 (a Racket update I sent on May 26), which I suspect may be related to this. Could someone take a look?

The page at https://qa.guix.gnu.org/issue/71203 says "Issue not found: This could mean the issue does not exist, it has no patches or has been closed." The page at https://data.qa.guix.gnu.org/repository/1/branch/issue-71203 does recognize the branch, but says "No information yet". The linked commit page at https://data.qa.guix.gnu.org/revision/2805bc613df06a726035ba19e9b60762487963ef has one entry in the "Jobs" table, "created 2024-05-26 06:07:51.458233".

I have only surface-level familiarity with the QA system, so sorry if I'm missing something!

Thanks,
Philip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Little progress on powerpc64le and aarch64 builds on ci.guix
  2024-06-12  9:47         ` Andreas Enge
  2024-06-12 14:50           ` Maxim Cournoyer
@ 2024-06-17 12:07           ` Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2024-06-17 12:07 UTC (permalink / raw)
  To: Andreas Enge; +Cc: guix-devel

Hello!

Andreas Enge <andreas@enge.fr> skribis:

> Am Thu, Jun 06, 2024 at 07:48:27PM +0200 schrieb Andreas Enge:
>> Could the graph on
>>    https://ci.guix.gnu.org/metrics
>> be augmented by the number of packages to be built for the different
>> architectures?

That would be nice, I agree (I haven’t looked much at that part of the
code).

> In that direction, the metrics now show that very few packages were built
> in the last 24 hours, except maybe for ARM (where we anyway build few
> packages). But the number of waiting builds stalls at around 280000.
>
> Are these all for ARM now? Should we cancel builds a bit more aggressively
> to make sure that recent packages are favoured?

In the meantime, here’s me doing stats-as-a-service:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo -u cuirass psql cuirass
cuirass=> select count(*) from builds where status = -2 ;
 count  
--------
 284314
(1 row)

Time: 635.478 ms
cuirass=> select count(*) from builds where status = -2 and system = 'x86_64-linux';
 count 
-------
     0
(1 row)

Time: 761.333 ms
cuirass=> select count(*) from builds where status = -2 and system = 'aarch64-linux';
 count  
--------
 160847
(1 row)

Time: 661.968 ms
cuirass=> select count(*) from builds where status = -2 and system = 'powerpc64le-linux';
 count  
--------
 119124
(1 row)

Time: 589.800 ms
cuirass=> select count(*) from builds where status = -2 and system = 'armhf-linux';
 count 
-------
  4343
(1 row)

Time: 549.242 ms
cuirass=> select count(*) from builds where status = -2 and system = 'i686-linux';
 count 
-------
     0
(1 row)

Time: 1088.130 ms (00:01.088)
--8<---------------cut here---------------end--------------->8---

So lots of AArch64 and POWER9 builds.

Executive summary:

  1. Of all the AArch64 build machines we have, only ‘overdrive1’ is
     currently actually contributing build power;

  2. AArch64 build machines ‘pankow’, ‘grunewald’, and ‘kreuzberg’
     (HoneyCombs) need on-site intervention so we can reconfigure them
     and reboot them.

  3. Some other AArch64 build machines (‘lieserl’ and ‘monokuma’) have
     been off for months and we’re discussing on guix-sysadmin ways to
     turn them back on;

  4. POWER9, I’m not sure.

  5. ‘cuirass remote-server’ may be too slow at handling incoming
     messages from workers, leading to redundant builds and the
     impression on https://ci.guix.gnu.org/workers that workers are
     idle, even when they’re in fact busy building stuff.


Investigation details:

I noticed that ‘cuirass remote-server’ on berlin would all too often
consider workers as “unresponsive” (meaning that it hasn’t received a
‘ping’ message from them in the past 2 minutes):

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo grep unresponsive /var/log/cuirass-remote-server.log |tail -10
2024-06-17 12:44:02 restarted 1 builds that were on unresponsive workers
2024-06-17 12:50:03 restarted 1 builds that were on unresponsive workers
2024-06-17 12:55:03 restarted 1 builds that were on unresponsive workers
2024-06-17 13:01:03 restarted 3 builds that were on unresponsive workers
2024-06-17 13:08:03 restarted 1 builds that were on unresponsive workers
2024-06-17 13:20:03 restarted 1 builds that were on unresponsive workers
2024-06-17 13:22:03 restarted 4 builds that were on unresponsive workers
2024-06-17 13:24:03 restarted 2 builds that were on unresponsive workers
2024-06-17 13:29:03 restarted 1 builds that were on unresponsive workers
2024-06-17 13:33:03 restarted 3 builds that were on unresponsive workers
--8<---------------cut here---------------end--------------->8---

As shown in this log, the effect is that some builds get restarted, even
though they are still being built by a worker that was wrongfully
considered unresponsive.

This needs further investigation.  The SQL query for
‘db-get-pending-build’ fixed by Cuirass commit
17338588d4862b04e9e405c1244a2ea703b50d98 is no longer at fault: it’s now
reasonably fast (there’s a warning in ‘cuirass-remote-server.log’ if it
ever takes more than 10s).  It could be that the backlog of incoming
messages in ‘remote-server’ still keeps increasing though, since workers
send pings every minute no matter what.

A further problem is that we’re unable to retrieve binaries from a
couple of build machines:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo grep error: /var/log/cuirass-remote-server.log |tail -10
2024-06-17 13:05:21 error: failed to add /gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b to store: path `/gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b' does not exist and cannot be created
2024-06-17 13:05:21 error: The remote-worker signing key might be unauthorized.
2024-06-17 13:05:21 error: failed to add /gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b to store: path `/gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b' does not exist and cannot be created
2024-06-17 13:05:21 error: The remote-worker signing key might be unauthorized.
2024-06-17 13:05:21 error: failed to add /gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b to store: path `/gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b' does not exist and cannot be created
2024-06-17 13:05:21 error: The remote-worker signing key might be unauthorized.
2024-06-17 13:17:29 error: failed to add /gnu/store/ljhvgbblb4y7554rg542vam5hp8rg9mg-ocaml-bos-0.2.1 to store: path `/gnu/store/ljhvgbblb4y7554rg542vam5hp8rg9mg-ocaml-bos-0.2.1' does not exist and cannot be created
2024-06-17 13:17:29 error: The remote-worker signing key might be unauthorized.
2024-06-17 13:24:03 error: failed to add /gnu/store/vb57h47b5xpin1h0rrvh9qd2bxapy8f7-ocaml-uucp-15.0.0 to store: path `/gnu/store/vb57h47b5xpin1h0rrvh9qd2bxapy8f7-ocaml-uucp-15.0.0' does not exist and cannot be created
2024-06-17 13:24:03 error: The remote-worker signing key might be unauthorized.
--8<---------------cut here---------------end--------------->8---

By picking store items from these error messages, we can determine that
at least ‘pankow’ (10.0.0.8, AArch64) and ‘grunewald’ (10.0.0.10,
AArch64) are at fault:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ guix gc --derivers /gnu/store/vb57h47b5xpin1h0rrvh9qd2bxapy8f7-ocaml-uucp-15.0.0
/gnu/store/8yc7j6q169f8312wx6jxs7g0z4xy5l5l-ocaml-uucp-15.0.0.drv
ludo@berlin ~$ sudo grep 8yc7j6q169f8312wx6jxs7g0z4xy5l5l /var/log/cuirass-remote-server.log |tail -10
2024-06-17 13:21:50 10.0.0.8 (uUTl7MVR): build started: '/gnu/store/8yc7j6q169f8312wx6jxs7g0z4xy5l5l-ocaml-uucp-15.0.0.drv'.
2024-06-17 13:24:03 fetching 1 outputs of '/gnu/store/8yc7j6q169f8312wx6jxs7g0z4xy5l5l-ocaml-uucp-15.0.0.drv' from http://10.0.0.8:5558
2024-06-17 13:24:03 build succeeded: '/gnu/store/8yc7j6q169f8312wx6jxs7g0z4xy5l5l-ocaml-uucp-15.0.0.drv'
ludo@berlin ~$ guix gc --derivers /gnu/store/f96ya7x7yjns39n8np16rmnhzarqcchd-guix-78d385a6b
/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv
ludo@berlin ~$ sudo grep ygrgwp9jyksjpnd76b83ifdskbcdjbhh /var/log/cuirass-remote-server.log  |tail -10
2024-06-17 13:05:21 fetching 1 outputs of '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv' from http://10.0.0.8:5558
2024-06-17 13:05:21 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:05:21 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:05:21 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:05:21 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:34:39 build failed: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:41:08 fetching 1 outputs of '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv' from http://10.0.0.10:5558
2024-06-17 13:41:08 fetching 1 outputs of '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv' from http://10.0.0.10:5558
2024-06-17 13:41:09 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
2024-06-17 13:41:09 build succeeded: '/gnu/store/ygrgwp9jyksjpnd76b83ifdskbcdjbhh-guix-78d385a6b.drv'
--8<---------------cut here---------------end--------------->8---

The signing key of ‘grunewald’ is definitely registered:

--8<---------------cut here---------------start------------->8---
$ ssh grunewald cat /etc/guix/signing-key.pub
(public-key 
 (ecc 
  (curve Ed25519)
  (q #370A0165E60213CA122E026402EE3DEA61FE4E4EE27D16DA44044AA49714D481#)
  )
 )
$ grep -rl 370A0165E60213CA122E026402EE3DEA61FE4E4EE27D16DA44044AA49714D481 ~/src/guix-maintenance/hydra/
$ ssh berlin grep 370A0165E60213CA122E026402EE3DEA61FE4E4EE27D16DA44044AA49714D481 /etc/guix/acl
    (q #370A0165E60213CA122E026402EE3DEA61FE4E4EE27D16DA44044AA49714D481#)
--8<---------------cut here---------------end--------------->8---

That of ‘pankow’ I can’t say because I cannot log in.  Most likely, it
rebooted and might have regenerated a new signing key different from the
one that’s registered.  So in effect, ‘pankow’ is effectively not
contributing any build.

The third machine of the HoneyComb family is ‘kreuzberg’: it’s been off
for a few days, after I rebooted it and it didn’t come back.

Thanks,
Ludo’.

PS: I’m traveling this week so I won’t be very responsive.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* qa.guix delays in processing patches
  2024-06-16 14:22       ` Philip McGrath
@ 2024-06-17 12:11         ` Ludovic Courtès
  2024-06-19  4:45           ` Philip McGrath
  2024-06-19 13:50           ` Christopher Baines
  0 siblings, 2 replies; 15+ messages in thread
From: Ludovic Courtès @ 2024-06-17 12:11 UTC (permalink / raw)
  To: Philip McGrath; +Cc: Lars-Dominik Braun, Brian Cully, Christopher Baines

Hi Philip,

"Philip McGrath" <philip@philipmcgrath.com> skribis:

> For some reason QA still doesn't seem to be working for https://issues.guix.gnu.org/71203 (a Racket update I sent on May 26), which I suspect may be related to this. Could someone take a look?
>
> The page at https://qa.guix.gnu.org/issue/71203 says "Issue not found: This could mean the issue does not exist, it has no patches or has been closed." The page at https://data.qa.guix.gnu.org/repository/1/branch/issue-71203 does recognize the branch, but says "No information yet". The linked commit page at https://data.qa.guix.gnu.org/revision/2805bc613df06a726035ba19e9b60762487963ef has one entry in the "Jobs" table, "created 2024-05-26 06:07:51.458233".
>
> I have only surface-level familiarity with the QA system, so sorry if I'm missing something!

As you may know, qa.guix is using separate hardware and software
infrastructure from ci.guix, based on the Build Coordinator and the Data
Service.

It’s unclear to me why issues are sometimes seemingly not picked up.
Chris, do you have more insight into this?

Anyway, there’s a trick: Philip, if you rebase your patch series and
resend it, qa.guix is like to consider it again and to spawn builds.
Please consider doing that!

Ludo’.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: qa.guix delays in processing patches
  2024-06-17 12:11         ` qa.guix delays in processing patches Ludovic Courtès
@ 2024-06-19  4:45           ` Philip McGrath
  2024-06-19 13:50           ` Christopher Baines
  1 sibling, 0 replies; 15+ messages in thread
From: Philip McGrath @ 2024-06-19  4:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Lars-Dominik Braun, guix-devel, Christopher Baines

Hi Ludo’,

On Mon, Jun 17, 2024, at 8:11 AM, Ludovic Courtès wrote:
> Hi Philip,
>
> "Philip McGrath" <philip@philipmcgrath.com> skribis:
>
>> For some reason QA still doesn't seem to be working for https://issues.guix.gnu.org/71203 (a Racket update I sent on May 26), which I suspect may be related to this. Could someone take a look?
>>
>> [...]
>>
>> I have only surface-level familiarity with the QA system, so sorry if I'm missing something!
>
> As you may know, qa.guix is using separate hardware and software
> infrastructure from ci.guix, based on the Build Coordinator and the Data
> Service.
>

Oh, I totally missed that distinction! I know both Cuirass and the Build Coordinator/Data Service exist, but I hadn't thought about which provided each service, and I guess I was unconsciously thinking of "CI" and "QA" as synonyms.

>
> Anyway, there’s a trick: Philip, if you rebase your patch series and
> resend it, qa.guix is like to consider it again and to spawn builds.
> Please consider doing that!
>

Thanks! I did that about 36 hours ago, and https://qa.guix.gnu.org/issue/71203 indeed recognized it, though it still is in the "yet to process revision" state.

> It’s unclear to me why issues are sometimes seemingly not picked up.
> Chris, do you have more insight into this?

I should have mentioned this in my last email, but, when I first sent the patch series, it *had* been picked up as https://qa.guix.gnu.org/issue/71203 and gotten to the "yet to process revision" state. I checked in on it a few times, and it remained in that state: only the last time (before rebasing) did it say "issue not found", though I'm not sure how long it had been since I'd last checked. I guess one thing I'm realizing is I don't know how long is normal for QA to take vs. when I should report a potential problem.

Thanks,
Philip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: qa.guix delays in processing patches
  2024-06-17 12:11         ` qa.guix delays in processing patches Ludovic Courtès
  2024-06-19  4:45           ` Philip McGrath
@ 2024-06-19 13:50           ` Christopher Baines
  2024-06-25 20:07             ` Philip McGrath
  1 sibling, 1 reply; 15+ messages in thread
From: Christopher Baines @ 2024-06-19 13:50 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: Philip McGrath, Lars-Dominik Braun, Brian Cully,
	Christopher Baines

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Philip,
>
> "Philip McGrath" <philip@philipmcgrath.com> skribis:
>
>> For some reason QA still doesn't seem to be working for
>> https://issues.guix.gnu.org/71203 (a Racket update I sent on May
>> 26), which I suspect may be related to this. Could someone take a
>> look?
>>
>> The page at https://qa.guix.gnu.org/issue/71203 says "Issue not
>> found: This could mean the issue does not exist, it has no patches
>> or has been closed." The page at
>> https://data.qa.guix.gnu.org/repository/1/branch/issue-71203 does
>> recognize the branch, but says "No information yet". The linked
>> commit page at
>> https://data.qa.guix.gnu.org/revision/2805bc613df06a726035ba19e9b60762487963ef
>> has one entry in the "Jobs" table, "created 2024-05-26
>> 06:07:51.458233".
>>
>> I have only surface-level familiarity with the QA system, so sorry if I'm missing something!
>
> As you may know, qa.guix is using separate hardware and software
> infrastructure from ci.guix, based on the Build Coordinator and the Data
> Service.
>
> It’s unclear to me why issues are sometimes seemingly not picked up.
> Chris, do you have more insight into this?

QA just looks at a small number of latest series [1] and those
associated issues so I'm guessing in this case the patch series was old
enough for QA not to be looking at it. This is mostly due to disk space
limitations for data.qa.guix.gnu.org.

1: https://git.savannah.gnu.org/cgit/guix/qa-frontpage.git/tree/scripts/guix-qa-frontpage.in#n154

Unfortunately the messaging is rather poor in this circumstance.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: qa.guix delays in processing patches
  2024-06-19 13:50           ` Christopher Baines
@ 2024-06-25 20:07             ` Philip McGrath
  2024-06-27 13:32               ` Andreas Enge
  0 siblings, 1 reply; 15+ messages in thread
From: Philip McGrath @ 2024-06-25 20:07 UTC (permalink / raw)
  To: Christopher Baines, Ludovic Courtès
  Cc: Lars-Dominik Braun, guix-devel, Christopher Baines

Hi Chris,

On 6/19/24 09:50, Christopher Baines wrote:
> Ludovic Courtès <ludo@gnu.org> writes:
>>
>> It’s unclear to me why issues are sometimes seemingly not picked up.
>> Chris, do you have more insight into this?
> 
> QA just looks at a small number of latest series [1] and those
> associated issues so I'm guessing in this case the patch series was old
> enough for QA not to be looking at it. This is mostly due to disk space
> limitations for data.qa.guix.gnu.org.
> 
> 1: https://git.savannah.gnu.org/cgit/guix/qa-frontpage.git/tree/scripts/guix-qa-frontpage.in#n154
> 
> Unfortunately the messaging is rather poor in this circumstance.

I'm not sure this explains why the job was dropped after having been 
picked up, as I (belatedly) mentioned in my reply to Ludo’:

https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00203.html

> 
> I should have mentioned this in my last email, but, when I first sent the patch 
> series, it *had* been picked up as https://qa.guix.gnu.org/issue/71203 and 
> gotten to the "yet to process revision" state. I checked in on it a few times, 
> and it remained in that state: only the last time (before rebasing) did it say 
> "issue not found", though I'm not sure how long it had been since I'd last 
> checked. I guess one thing I'm realizing is I don't know how long is normal for 
> QA to take vs. when I should report a potential problem.

On that last point, it's now been over a week since I rebased the 
series, and <https://qa.guix.gnu.org/issue/71203> is still in the "yet 
to process revision" state. Is this normal?

At least a few revisions created later seem to have reached "success":

https://data.qa.guix.gnu.org/revision/003695a6a66ab2a69506d2f5a689170ccc340505

https://data.qa.guix.gnu.org/revision/d0e425b0f538a8762e3199f5223597835cfe75da

(The later seems to have "start"ed after the patch had already been merged.)

It's not clear to me how jobs are ordered in the queue, which makes it 
hard to tell if this is normal processing time or if something might be 
going wrong again.

I think there can be a lot of value in QA, especially in catching 
regressions on architectures I don't have available. But, even ignoring 
the additional delay waiting for the May 26 job that disappeared, this 
feels to me like a disproportionate overhead for a routine package update.

Thanks,
Philip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: qa.guix delays in processing patches
  2024-06-25 20:07             ` Philip McGrath
@ 2024-06-27 13:32               ` Andreas Enge
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Enge @ 2024-06-27 13:32 UTC (permalink / raw)
  To: Philip McGrath
  Cc: Christopher Baines, Ludovic Courtès, Lars-Dominik Braun,
	guix-devel, Christopher Baines

Am Tue, Jun 25, 2024 at 04:07:55PM -0400 schrieb Philip McGrath:
> It's not clear to me how jobs are ordered in the queue, which makes it hard
> to tell if this is normal processing time or if something might be going
> wrong again.

It is "last in, first out": newest patches can hide older ones. It has
happened to me in the past that patches "disappeared" due to a constant
flow of newer ones. The reverse order would ensure that all patches are
eventually treated, but with an ever growing backlog as long as there are
not enough resources to handle them all.

We could argue which of the two solutions is better, but to attack the
root cause we need more resources (machines, but also people).

Chris's effort to create a QA "circle" goes in a good direction.

Andreas



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-27 13:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-31  6:25 CI is not processing jobs Lars-Dominik Braun
2024-06-01 14:09 ` Ludovic Courtès
2024-06-02 21:14   ` Ludovic Courtès
2024-06-06 15:05     ` Ludovic Courtès
2024-06-06 17:48       ` Andreas Enge
2024-06-12  9:47         ` Andreas Enge
2024-06-12 14:50           ` Maxim Cournoyer
2024-06-17 12:07           ` Little progress on powerpc64le and aarch64 builds on ci.guix Ludovic Courtès
2024-06-07  6:38       ` CI is not processing jobs Lars-Dominik Braun
2024-06-16 14:22       ` Philip McGrath
2024-06-17 12:11         ` qa.guix delays in processing patches Ludovic Courtès
2024-06-19  4:45           ` Philip McGrath
2024-06-19 13:50           ` Christopher Baines
2024-06-25 20:07             ` Philip McGrath
2024-06-27 13:32               ` Andreas Enge

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.