Hello,

2018. dec. 21., P 2:18 dátummal Mark H Weaver <mhw@netris.org> ezt írta:
Hi Julien,

I've rearranged your reply from "top-posting" style to "bottom-posting"
style.  Please consider using bottom-posting in the future.

I wrote:

> Julien Lepiller <julien@lepiller.eu> writes:
>
>> I'd like to get staging merged soon, as it wasn't for quite some
>> time. Here are some stats about the current state of substitutes for
>> staging:
>>
>> According to guix weather, we have:
>>
>> | architecture | berlin | hydra |
>> +--------------+--------+-------+
>> | x86_64       | 36.5%  | 81.7% |
>> | i686         | 23.8%  | 71.0% |
>> | aarch64      | 22.2%  | 00.0% |
>> | armhf        | 17.0%  | 45.6% |
>>
>> What should the next step be?
>
> I think we should wait until the coverage on armhf and aarch64 have
> become larger, for the sake of users on those systems.
>
> Also, I've seen some commits that make me wonder if hydra is still
> being configured as an authorized substitute server on new Guix
> installations.
> Do you know?
>
> If 'berlin' is the only substitute server by default, then we certainly
> need to wait for those numbers to get higher, no?
>
> What do you think?

Julien Lepiller <julien@lepiller.eu> responded:

> I agree, but I wonder if there is a reason for these to be so low?

It's a good question.  I have several hypotheses:

* Unfortunately, it is fairly common for builds for important core
  packages to spuriously fail, often due to unreliable test suites, and
  to cause thousands of other important dependent packages to fail.
  When this happens on Hydra, I can see what's going on, and restart the
  build and all of its dependents.

This is currently a problem, we can't see
which dependency causes the dependency failure.


  I wouldn't be surprised if some important core packages spuriously
  failed to build on Berlin, but we have no effective way to see what
  happened there.  If that's the case, the 'guix weather' numbers above
  might never get much higher no matter how long we wait.

* Berlin's build slots may have been occupied for long periods of time
  by 'test.*' jobs stuck in an endless "waiting for udevd..." loop, as
  described in <https://bugs.gnu.org/33362>.

  Hydra's web interface allows me to monitor active jobs and manually
  kill those stuck jobs when I find them.  I don't know how to do that
  on Berlin.

* Especially on armhf and aarch64, where Berlin has very little build
  capacity, and new builds are being added to Berlin's build queue must
  faster than they can be built, it is quite possible that Berlin is
  spending most of its effort on long-outdated builds.

  On Hydra, I can see when this is happening, and often intervene by
  cancelling large numbers of outdated builds on armhf, so that it
  remains focused on the most popular and up-to-date packages.
We are currently missing an admin interface on berlin, and we would need that, as canceling a job should be privileged.


* On WIP branches like 'core-updates' and 'staging', when a new
  evaluation is done, I cancel all outdated Hydra jobs on those
  branches.  I don't know if anything similar is done on Berlin.

In summary, there are several things that I regularly do to make
efficient use of Hydra's limited build capacity.  I periodically look at
Berlin's web interface to see how it has progressed, but it is currently
mostly a black box to me.  I see no effective way to focus its limited
resources on the most important builds, or to see when build slots are
stuck.

     Regards,
       Mark
I am currently looking around how to improve the situation. Suggestions are welcome.

G_bor