From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?G=C3=A1bor_Boskovits?= Subject: Re: Merging staging? Date: Fri, 21 Dec 2018 10:06:37 +0100 Message-ID: References: <87o99gvyce.fsf@netris.org> <6AD6EA9A-7634-4A8E-8A59-F3D35BC0F82A@lepiller.eu> <8736qradap.fsf@netris.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000003d298e057d849114" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:38769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gaGlm-0004v8-2c for guix-devel@gnu.org; Fri, 21 Dec 2018 04:06:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gaGlk-0002cw-Rr for guix-devel@gnu.org; Fri, 21 Dec 2018 04:06:57 -0500 Received: from mail-it1-x12a.google.com ([2607:f8b0:4864:20::12a]:50375) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gaGlk-0002Yg-JL for guix-devel@gnu.org; Fri, 21 Dec 2018 04:06:56 -0500 Received: by mail-it1-x12a.google.com with SMTP id z7so5877166iti.0 for ; Fri, 21 Dec 2018 01:06:50 -0800 (PST) In-Reply-To: <8736qradap.fsf@netris.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Mark H Weaver Cc: Guix-devel --0000000000003d298e057d849114 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, 2018. dec. 21., P 2:18 d=C3=A1tummal Mark H Weaver ezt =C3= =ADrta: > Hi Julien, > > I've rearranged your reply from "top-posting" style to "bottom-posting" > style. Please consider using bottom-posting in the future. > > I wrote: > > > Julien Lepiller writes: > > > >> I'd like to get staging merged soon, as it wasn't for quite some > >> time. Here are some stats about the current state of substitutes for > >> staging: > >> > >> According to guix weather, we have: > >> > >> | architecture | berlin | hydra | > >> +--------------+--------+-------+ > >> | x86_64 | 36.5% | 81.7% | > >> | i686 | 23.8% | 71.0% | > >> | aarch64 | 22.2% | 00.0% | > >> | armhf | 17.0% | 45.6% | > >> > >> What should the next step be? > > > > I think we should wait until the coverage on armhf and aarch64 have > > become larger, for the sake of users on those systems. > > > > Also, I've seen some commits that make me wonder if hydra is still > > being configured as an authorized substitute server on new Guix > > installations. > > Do you know? > > > > If 'berlin' is the only substitute server by default, then we certainly > > need to wait for those numbers to get higher, no? > > > > What do you think? > > Julien Lepiller responded: > > > I agree, but I wonder if there is a reason for these to be so low? > > It's a good question. I have several hypotheses: > > * Unfortunately, it is fairly common for builds for important core > packages to spuriously fail, often due to unreliable test suites, and > to cause thousands of other important dependent packages to fail. > When this happens on Hydra, I can see what's going on, and restart the > build and all of its dependents. > This is currently a problem, we can't see which dependency causes the dependency failure. > I wouldn't be surprised if some important core packages spuriously > failed to build on Berlin, but we have no effective way to see what > happened there. If that's the case, the 'guix weather' numbers above > might never get much higher no matter how long we wait. > > * Berlin's build slots may have been occupied for long periods of time > by 'test.*' jobs stuck in an endless "waiting for udevd..." loop, as > described in . > > Hydra's web interface allows me to monitor active jobs and manually > kill those stuck jobs when I find them. I don't know how to do that > on Berlin. > > * Especially on armhf and aarch64, where Berlin has very little build > capacity, and new builds are being added to Berlin's build queue must > faster than they can be built, it is quite possible that Berlin is > spending most of its effort on long-outdated builds. > > On Hydra, I can see when this is happening, and often intervene by > cancelling large numbers of outdated builds on armhf, so that it > remains focused on the most popular and up-to-date packages. > We are currently missing an admin interface on berlin, and we would need that, as canceling a job should be privileged. > * On WIP branches like 'core-updates' and 'staging', when a new > evaluation is done, I cancel all outdated Hydra jobs on those > branches. I don't know if anything similar is done on Berlin. > > In summary, there are several things that I regularly do to make > efficient use of Hydra's limited build capacity. I periodically look at > Berlin's web interface to see how it has progressed, but it is currently > mostly a black box to me. I see no effective way to focus its limited > resources on the most important builds, or to see when build slots are > stuck. > > Regards, > Mark > I am currently looking around how to improve the situation. Suggestions are welcome. G_bor > > --0000000000003d298e057d849114 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

2018. dec. 21., P 2:18 d=C3=A1tummal Mark H Weaver <mhw@netris.org> ezt =C3=ADrta:
Hi Julien,

I've rearranged your reply from "top-posting" style to "= bottom-posting"
style.=C2=A0 Please consider using bottom-posting in the future.

I wrote:

> Julien Lepiller <julien@lepiller.eu> writes:
>
>> I'd like to get staging merged soon, as it wasn't for quit= e some
>> time. Here are some stats about the current state of substitutes f= or
>> staging:
>>
>> According to guix weather, we have:
>>
>> | architecture | berlin | hydra |
>> +--------------+--------+-------+
>> | x86_64=C2=A0 =C2=A0 =C2=A0 =C2=A0| 36.5%=C2=A0 | 81.7% |
>> | i686=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 23.8%=C2=A0 | 71.0% | >> | aarch64=C2=A0 =C2=A0 =C2=A0 | 22.2%=C2=A0 | 00.0% |
>> | armhf=C2=A0 =C2=A0 =C2=A0 =C2=A0 | 17.0%=C2=A0 | 45.6% |
>>
>> What should the next step be?
>
> I think we should wait until the coverage on armhf and aarch64 have > become larger, for the sake of users on those systems.
>
> Also, I've seen some commits that make me wonder if hydra is still=
> being configured as an authorized substitute server on new Guix
> installations.
> Do you know?
>
> If 'berlin' is the only substitute server by default, then we = certainly
> need to wait for those numbers to get higher, no?
>
> What do you think?

Julien Lepiller <julien@lepiller.eu> responded:

> I agree, but I wonder if there is a reason for these to be so low?

It's a good question.=C2=A0 I have several hypotheses:

* Unfortunately, it is fairly common for builds for important core
=C2=A0 packages to spuriously fail, often due to unreliable test suites, an= d
=C2=A0 to cause thousands of other important dependent packages to fail. =C2=A0 When this happens on Hydra, I can see what's going on, and resta= rt the
=C2=A0 build and all of its dependents.

This is currently a problem, we can&= #39;t see
which dependency causes the dependency fai= lure.


=C2=A0 I wouldn't be surprised if some important core packages spurious= ly
=C2=A0 failed to build on Berlin, but we have no effective way to see what<= br> =C2=A0 happened there.=C2=A0 If that's the case, the 'guix weather&= #39; numbers above
=C2=A0 might never get much higher no matter how long we wait.

* Berlin's build slots may have been occupied for long periods of time<= br> =C2=A0 by 'test.*' jobs stuck in an endless "waiting for udevd= ..." loop, as
=C2=A0 described in <https://bugs.gnu.org/33362>.

=C2=A0 Hydra's web interface allows me to monitor active jobs and manua= lly
=C2=A0 kill those stuck jobs when I find them.=C2=A0 I don't know how t= o do that
=C2=A0 on Berlin.

* Especially on armhf and aarch64, where Berlin has very little build
=C2=A0 capacity, and new builds are being added to Berlin's build queue= must
=C2=A0 faster than they can be built, it is quite possible that Berlin is =C2=A0 spending most of its effort on long-outdated builds.

=C2=A0 On Hydra, I can see when this is happening, and often intervene by =C2=A0 cancelling large numbers of outdated builds on armhf, so that it
=C2=A0 remains focused on the most popular and up-to-date packages.
We are currently missing an admin in= terface on berlin, and we would need that, as canceling a job should be pri= vileged.


* On WIP branches like 'core-updates' and 'staging', when a= new
=C2=A0 evaluation is done, I cancel all outdated Hydra jobs on those
=C2=A0 branches.=C2=A0 I don't know if anything similar is done on Berl= in.

In summary, there are several things that I regularly do to make
efficient use of Hydra's limited build capacity.=C2=A0 I periodically l= ook at
Berlin's web interface to see how it has progressed, but it is currentl= y
mostly a black box to me.=C2=A0 I see no effective way to focus its limited=
resources on the most important builds, or to see when build slots are
stuck.

=C2=A0 =C2=A0 =C2=A0Regards,
=C2=A0 =C2=A0 =C2=A0 =C2=A0Mark
I am currently looking around how to improve the situation. Suggestions = are welcome.

G_bor
=

--0000000000003d298e057d849114--