stability of master - just QA and hydra is not enough

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* stability of master - just QA and hydra is not enough
@ 2017-07-01 17:36 ng0
  2017-07-01 18:01 ` Leo Famulari
  0 siblings, 1 reply; 15+ messages in thread
From: ng0 @ 2017-07-01 17:36 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2341 bytes --]

(This is brief and incomplete, just the way I see it right now)
Hi,

in the recent months (or rather: regulary) guix master is
regulary unusable.
To be accepted as a system which anyone can use even without
the need of having to run from git, the current deployment
process (is this called deployment? at least I mean the
commit getting into the master branch) isn't really acceptable
from a technical and social perspective.

I have no formal solution but I want to have this dicussion
because I can no longer stand the state of how often "assumed
to work" commits are pushed. QA is not enough, and waiting
for hydra to pick up on the failure isn't either.
We need to revise the way commits land in master.
Master can be relatively stable. We should aim for stable with
a combination of extending the QA process and a technical
approval mechanism.

Obviously we can't catch every error, that's what hydra/cuirass[0]
is for. What we can and should catch is a set of defined scenarios.
From my perspective GuixSD is the primary concern here for me, I
don't care for Guix on other systems.
In this rather not well though through scenario (give me 2 - 3 weeks
and I can write down my whole ideas, I have a busy schedule) I
imagine that _before_ commits end up in master we build a set of
virtual systems which at least must:

- be build successfully
- run through the initrd
- briefly see the login manager

We then need guidelines which commits are classified for building
on which set of test machines.
Finally the commit must be approved by more than 1 person and
commited.

There are odds and scenarios we can not test, but what we can
test we should test.
Stability must not be an enterprise feature (as it was mentioned
in the past), it is expected by people who don't want to waste
time with developing. Even reporting bugs is only done by those
who bother to do so or are able to. I have more to add to the
reasons when I can send out an longer email, this is just a bit
of an impulse.

0: What is it these days? Is hydra now just a in-retirement frontend
for cuirass or how does bayfront work these days? I understand cuirass,
not hydra.
-- 
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://n0is.noblogs.org/my-keys
https://www.infotropique.org https://krosos.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: stability of master - just QA and hydra is not enough
  2017-07-01 17:36 stability of master - just QA and hydra is not enough ng0
@ 2017-07-01 18:01 ` Leo Famulari
  2017-07-01 19:24   ` ng0
  2017-07-07  0:09   ` myglc2
  0 siblings, 2 replies; 15+ messages in thread
From: Leo Famulari @ 2017-07-01 18:01 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2194 bytes --]

On Sat, Jul 01, 2017 at 05:36:04PM +0000, ng0 wrote:
> (This is brief and incomplete, just the way I see it right now)

[...]

> imagine that _before_ commits end up in master we build a set of
> virtual systems which at least must:
> 
> - be build successfully
> - run through the initrd
> - briefly see the login manager
> 
> We then need guidelines which commits are classified for building
> on which set of test machines.
> Finally the commit must be approved by more than 1 person and
> commited.
> 
> There are odds and scenarios we can not test, but what we can
> test we should test.
> Stability must not be an enterprise feature (as it was mentioned
> in the past), it is expected by people who don't want to waste
> time with developing. Even reporting bugs is only done by those
> who bother to do so or are able to. I have more to add to the
> reasons when I can send out an longer email, this is just a bit
> of an impulse.

First, is there some outstanding bug that needs to be fixed? It's
frustrating to get messages like this without any context.

I agree that we should strive to make the master branch more reliable.

However, it must be understood that the main Guix contributors are
almost always *at the limit* of how much time and energy they can spend
on Guix.

Adding rules like requiring somebody else to test and approve a change
is unrealistic, since we can barely do what we do now. This suggestion
is basically equivalent to adding things to the patch review queue.

As for automated QA, our build farm is also almost always operating at
its limit. This is an easier problem to solve, because we can spend
money to increase the capacity. However...

> 0: What is it these days? Is hydra now just a in-retirement frontend
> for cuirass or how does bayfront work these days? I understand cuirass,
> not hydra.

... Bayfront is still not fully operational, so hydra.gnu.org is still
serving as the front-end of the build farm. We are still relying on the
Hydra software. That is, the situation is basically the same as before.
Adding build machines will not help very much until the front-end
hardware gets faster.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: stability of master - just QA and hydra is not enough
  2017-07-01 18:01 ` Leo Famulari
@ 2017-07-01 19:24   ` ng0
  2017-07-01 19:52     ` Leo Famulari
  2017-07-07  0:09   ` myglc2
  1 sibling, 1 reply; 15+ messages in thread
From: ng0 @ 2017-07-01 19:24 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

Leo Famulari transcribed 3.2K bytes:
> On Sat, Jul 01, 2017 at 05:36:04PM +0000, ng0 wrote:
> > (This is brief and incomplete, just the way I see it right now)
> 
> [...]
> 
> > imagine that _before_ commits end up in master we build a set of
> > virtual systems which at least must:
> > 
> > - be build successfully
> > - run through the initrd
> > - briefly see the login manager
> > 
> > We then need guidelines which commits are classified for building
> > on which set of test machines.
> > Finally the commit must be approved by more than 1 person and
> > commited.
> > 
> > There are odds and scenarios we can not test, but what we can
> > test we should test.
> > Stability must not be an enterprise feature (as it was mentioned
> > in the past), it is expected by people who don't want to waste
> > time with developing. Even reporting bugs is only done by those
> > who bother to do so or are able to. I have more to add to the
> > reasons when I can send out an longer email, this is just a bit
> > of an impulse.
> 
> First, is there some outstanding bug that needs to be fixed? It's
> frustrating to get messages like this without any context.

Yes, but I certainly will not run reconfigure on here from HEAD
again. When I ran into this I had not git setup, now I have.
So someone else must do this.

> I agree that we should strive to make the master branch more reliable.
> 
> However, it must be understood that the main Guix contributors are
> almost always *at the limit* of how much time and energy they can spend
> on Guix.

Sure, hence the disclaimer "brief" on the top. I will write a longer
text later this month to get more into detail about my ideas.

> Adding rules like requiring somebody else to test and approve a change
> is unrealistic, since we can barely do what we do now. This suggestion
> is basically equivalent to adding things to the patch review queue.
>
> As for automated QA, our build farm is also almost always operating at
> its limit. This is an easier problem to solve, because we can spend
> money to increase the capacity. However...
> 
> > 0: What is it these days? Is hydra now just a in-retirement frontend
> > for cuirass or how does bayfront work these days? I understand cuirass,
> > not hydra.
> 
> ... Bayfront is still not fully operational, so hydra.gnu.org is still
> serving as the front-end of the build farm. We are still relying on the
> Hydra software. That is, the situation is basically the same as before.
> Adding build machines will not help very much until the front-end
> hardware gets faster.

So you're basically saying: yes good idea, I agree but this is too much
presure on too little capacity in people and machines and we can not
do any of this any time soon.
Or did I miss something?
-- 
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://n0is.noblogs.org/my-keys
https://www.infotropique.org https://krosos.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: stability of master - just QA and hydra is not enough
  2017-07-01 19:24   ` ng0
@ 2017-07-01 19:52     ` Leo Famulari
  0 siblings, 0 replies; 15+ messages in thread
From: Leo Famulari @ 2017-07-01 19:52 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On Sat, Jul 01, 2017 at 07:24:25PM +0000, ng0 wrote:
> > First, is there some outstanding bug that needs to be fixed? It's
> > frustrating to get messages like this without any context.
> 
> Yes, but I certainly will not run reconfigure on here from HEAD
> again. When I ran into this I had not git setup, now I have.
> So someone else must do this.

I spent my time to find the bug and report it:
<https://bugs.gnu.org/27551>

> So you're basically saying: yes good idea, I agree but this is too much
> presure on too little capacity in people and machines and we can not
> do any of this any time soon.
> Or did I miss something?

I'm saying that we could do better.

I don't think that requiring more review of proposed changes from
project members is a good idea. Already, patch review is too much work.
Not to mention all the other work.

Automated QA is a good idea but we currently lack the resources to do
it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: stability of master - just QA and hydra is not enough
  2017-07-01 18:01 ` Leo Famulari
  2017-07-01 19:24   ` ng0
@ 2017-07-07  0:09   ` myglc2
  2017-07-07  3:00     ` Guix infrastructure Leo Famulari
  1 sibling, 1 reply; 15+ messages in thread
From: myglc2 @ 2017-07-07  0:09 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

On 07/01/2017 at 14:01 Leo Famulari writes:

> On Sat, Jul 01, 2017 at 05:36:04PM +0000, ng0 wrote:
[...]
>> 0: What is it these days? Is hydra now just a in-retirement frontend
>> for cuirass or how does bayfront work these days? I understand cuirass,
>> not hydra.
>
> ... Bayfront is still not fully operational, so hydra.gnu.org is still
> serving as the front-end of the build farm. We are still relying on the
> Hydra software. That is, the situation is basically the same as before.
> Adding build machines will not help very much until the front-end
> hardware gets faster.

This leaves me wondering ...

Is the hydra/front-end hardware going to be upgraded?

Is bayfront/cuirass intended to replace hydra?

The bayfront hardware described here ...

https://www.gnu.org/software/guix/news/growing-our-build-farm.html

... seems weak to me. Is there a plan to scale it up and make it redundant?

A reliable, resourced, managed, "nightly Guix build" should pay big
dividends for the project. But, from reading the lists, I get the
impression that such a thing does not exist. Is that correct?

Do we know what would be needed to achieve a complete nightly build?

TIA - George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Guix infrastructure
  2017-07-07  0:09   ` myglc2
@ 2017-07-07  3:00     ` Leo Famulari
  2017-07-07 12:19       ` Ludovic Courtès
  2017-07-09  6:30       ` Efraim Flashner
  0 siblings, 2 replies; 15+ messages in thread
From: Leo Famulari @ 2017-07-07  3:00 UTC (permalink / raw)
  To: myglc2; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1945 bytes --]

On Thu, Jul 06, 2017 at 08:09:17PM -0400, myglc2 wrote:
> On 07/01/2017 at 14:01 Leo Famulari writes:
> > ... Bayfront is still not fully operational, so hydra.gnu.org is still
> > serving as the front-end of the build farm. We are still relying on the
> > Hydra software. That is, the situation is basically the same as before.
> > Adding build machines will not help very much until the front-end
> > hardware gets faster.
> 
> This leaves me wondering ...
> 
> Is the hydra/front-end hardware going to be upgraded?

Yes...

> Is bayfront/cuirass intended to replace hydra?

... and yes.

> The bayfront hardware described here ...
> 
> https://www.gnu.org/software/guix/news/growing-our-build-farm.html
> 
> ... seems weak to me. Is there a plan to scale it up and make it redundant?

It will be a lot more powerful than the current Hydra system. As for
specific plans, I'll let those administering the system chime in.

> A reliable, resourced, managed, "nightly Guix build" should pay big
> dividends for the project. But, from reading the lists, I get the
> impression that such a thing does not exist. Is that correct?

Currently, we tend to build all the packages as often as we can with our
resources, which is less than once a day.

> Do we know what would be needed to achieve a complete nightly build?

It depends on what you mean by "complete".

I doubt we can find armhf hardware that could build all the packages
daily. That platform doesn't get very powerful in general and, in my
experience, the machines that do exist can't handle sustained high
loads, nor do they have fast network and I/O interfaces.

It is possible for x86_64, i686, and eventually for aarch64. Maybe we
will be able to cross-build from aarch64 to arhmf; I'm not sure. Efraim?

Ricardo has been working on getting some new x86_64 / i686 builders
online:

https://gnunet.org/bot/log/guix/2017-06-30#T1433202

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-07  3:00     ` Guix infrastructure Leo Famulari
@ 2017-07-07 12:19       ` Ludovic Courtès
  2017-07-08 23:50         ` ng0
  2017-07-09  0:43         ` myglc2
  2017-07-09  6:30       ` Efraim Flashner
  1 sibling, 2 replies; 15+ messages in thread
From: Ludovic Courtès @ 2017-07-07 12:19 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel, myglc2

Hello Guix!

Leo Famulari <leo@famulari.name> skribis:

> On Thu, Jul 06, 2017 at 08:09:17PM -0400, myglc2 wrote:
>> On 07/01/2017 at 14:01 Leo Famulari writes:
>> > ... Bayfront is still not fully operational, so hydra.gnu.org is still
>> > serving as the front-end of the build farm. We are still relying on the
>> > Hydra software. That is, the situation is basically the same as before.
>> > Adding build machines will not help very much until the front-end
>> > hardware gets faster.
>> 
>> This leaves me wondering ...
>> 
>> Is the hydra/front-end hardware going to be upgraded?
>
> Yes...
>
>> Is bayfront/cuirass intended to replace hydra?
>
> ... and yes.
>
>> The bayfront hardware described here ...
>> 
>> https://www.gnu.org/software/guix/news/growing-our-build-farm.html
>> 
>> ... seems weak to me. Is there a plan to scale it up and make it redundant?
>
> It will be a lot more powerful than the current Hydra system. As for
> specific plans, I'll let those administering the system chime in.

That machine is super powerful… but alas, it has also been terribly
unstable.  Vikings has been kind enough to assist us; they’ve notably
provided us with CPU replacements once already.  Despite these efforts,
the machine is still crashing.  We’re investigating with them what to
do next.

On top of that, all the testing and all the back and forth takes an
awful lot of time, which is in part due to hardware problems being hard
to pinpoint and debug in general, and in part due to us here in Bordeaux
(where the machine is hosted) being unable to scale up.

Infrastructure has been the project’s Achilles’ heel since we run the
crowdfunding campaign in Dec. 2015 (!).  Now it’s becoming detrimental
to the project.  Our initial plan was to buy more Libreboot-based
machines like the one above once the first one has proved to work well.
However, given the situation, we’ve been discussing on guix-sysadmin a
change in strategy, at least in the short term; Ricardo has been working
on re-purposing used hardware for our needs, and that may well be our
short- to medium-term solution.

For now, we prefer not to entrust the binaries we deliver to commercial
VPS providers.  I think we owe it to our users, but it undoubtedly has a
cost in terms of system maintenance.

I not only sympathize with your frustration, ng0, I feel it even ten
times more.  ;-)  Several of us are determined to come up with a
solution quickly, so I hope that will materialize soon!

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-07 12:19       ` Ludovic Courtès
@ 2017-07-08 23:50         ` ng0
  2017-07-09  9:21           ` Ricardo Wurmus
  2017-07-09  0:43         ` myglc2
  1 sibling, 1 reply; 15+ messages in thread
From: ng0 @ 2017-07-08 23:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, mail

[-- Attachment #1: Type: text/plain, Size: 3887 bytes --]

Ludovic Courtès transcribed 2.5K bytes:

> I not only sympathize with your frustration, ng0, I feel it even ten
> times more.  ;-)  Several of us are determined to come up with a
> solution quickly, so I hope that will materialize soon!
> 
> Thanks,
> Ludo’.
> 
> 
Okay.

About that..

It was not just no builds available. What for me the real grave issues are at
the moment as someone who's primarily considering to base a project on GuixSD
and who secondarily and within that scope contributes to GuixSD:

- master is not stable and it is not being treated as a high priority
  problem, at least that is my impression from remarks in chat/emails
  and what I've been able to read. All arguing aside, that's something
  which can be fixed. CC'd dvn: in our last mumble session you mentioned
  that ii could get in touch with Guix. I don't think you'll forget it
  but here's an email for the start.
  I think you (Ludovic) suggested something similar to what ii is
  offering, but maybe I just imagined that ou commented on that.

- a bug in the compiler which is used in the core of Guix is bad. In my
  understanding that we could at least try to evade this by reducing the
  module sizes is met with arguments like "this will be fixed in the
  future, for now we can only split 1 module the rest has to stay
  together for semantic and linguistic reasons".
  If my understanding of the whole situation is wrong this is due to the
  intransparent dealing with this serious problem and the way my idea
  to temporarily fix it was met.

  For me GuixSD as it is at the moment, is unusable. Not with my hardware,
  not with my knowledge and devices I have. But with the intention of the
  project I am running it aims at hardware which can not evade this bug.
  On my side even when we set up our own build infrastructure it will
  not change the fact that the current pull/make is using way too much
  resources for the end result I target.

- Writing system services in Shepherd is hard. The debugging of these
  services is a major pain compared to writing services for OpenRC or
  systemd. I'm no expert in Guile, I understand more than 2.7(?) years ago
  (coming to Guix was my first exposure to Guile). With OpenRC I'm not
  really sure why it is easier. I just know what would work and what
  doesn't work. It has its limits, it operates in another system
  structure.
- Debugging in general. It would be *very* good if debugging symbols and
  capabilities wouldn't be an 'write your local inherits and overrides
  so that you get debug outputs' thing. This is not just my opinion,
  people brought this up in off the record chats (and possibly even in
  irc) before.

These are the major issues Guix could fix.
There's more where I know it will not be fixed and/or it can not be fixed[0],
those are reasons why we are currently re-evaluating the choice of the
system. Maybe it turns out playing the high-chase speed run with Guix
is the only sane choice. Maybe it doesn't. There are no hard feelings if
the issues above will not be fixed, it's just something which makes it
frustrating to work with Guix. The frustration did set in when the 0.13
/ guile 2.2.2 related bug(s) came to the list of existing issues for me.

When I came to GuixSD in Winter 2015 I saw a huge potential. I still see
it. I hope we can fix this, no matter how the re-evaluation on our side
turns out. 

0: Most of these issues are differences in goals and how it applies to
   what is technically implemented, etc.
   Public docs are incomplete and intransparent, so the links below
   (for the curious) will not represent the actual state of what is
   being worked towards.
-- 
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://n0is.noblogs.org/my-keys
https://www.infotropique.org https://krosos.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-07 12:19       ` Ludovic Courtès
  2017-07-08 23:50         ` ng0
@ 2017-07-09  0:43         ` myglc2
  2017-07-09  8:49           ` Ricardo Wurmus
  1 sibling, 1 reply; 15+ messages in thread
From: myglc2 @ 2017-07-09  0:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On 07/07/2017 at 14:19 Ludovic Courtès writes:

> Hello Guix!
>
> Leo Famulari <leo@famulari.name> skribis:
>
>> On Thu, Jul 06, 2017 at 08:09:17PM -0400, myglc2 wrote:
>>> On 07/01/2017 at 14:01 Leo Famulari writes:
>>> > ... Bayfront is still not fully operational, so hydra.gnu.org is still
>>> > serving as the front-end of the build farm. We are still relying on the
>>> > Hydra software. That is, the situation is basically the same as before.
>>> > Adding build machines will not help very much until the front-end
>>> > hardware gets faster.
>>> 
>>> This leaves me wondering ...
>>> 
>>> Is the hydra/front-end hardware going to be upgraded?
>>
>> Yes...
>>
>>> Is bayfront/cuirass intended to replace hydra?
>>
>> ... and yes.
>>
>>> The bayfront hardware described here ...
>>> 
>>> https://www.gnu.org/software/guix/news/growing-our-build-farm.html
>>> 
>>> ... seems weak to me. Is there a plan to scale it up and make it redundant?
>>
>> It will be a lot more powerful than the current Hydra system. As for
>> specific plans, I'll let those administering the system chime in.
>
> That machine is super powerful…

Well, I disagree. A 2010 motherboard with 2 x 2011 CPUs (16 core at
1.6GHz) is weak compared to modern servers. 

> but alas, it has also been terribly
> unstable.  Vikings has been kind enough to assist us; they’ve notably
> provided us with CPU replacements once already.  Despite these efforts,
> the machine is still crashing.  We’re investigating with them what to
> do next.
>
> On top of that, all the testing and all the back and forth takes an
> awful lot of time, which is in part due to hardware problems being hard
> to pinpoint and debug in general, and in part due to us here in Bordeaux
> (where the machine is hosted) being unable to scale up.

As you have experienced here, the learning/deployment costs and hassle
associated with each new type of server often dwarfs other costs. The
best way to minimize this is to minimize the number of types of servers
you own. In practice this means you need to place your bets carefully
and quickly cut you loses if things don't work out.  It also means that
when (not if) a server breaks you should buy one exactly like the broken
one.

At this point it makes sense to abandon the Vikings motherboard and
choose a popular, mainstream, current x86_64 motherboard. Since AMD has
not been a competitive server vendor for the last ~8 years this means,
practically speaking, picking a popular intel-based motherboard.

> Infrastructure has been the project’s Achilles’ heel since we run the
> crowdfunding campaign in Dec. 2015 (!).  Now it’s becoming detrimental
> to the project.  Our initial plan was to buy more Libreboot-based
> machines like the one above once the first one has proved to work
> well.  However, given the situation, we’ve been discussing on
> guix-sysadmin a change in strategy, at least in the short term;
> Ricardo has been working on re-purposing used hardware for our needs,
> and that may well be our short- to medium-term solution.

Hmm, didn't know about guix-sysadmin until now but couldn't easily read
it. So, FWIW, here are some additional comments/suggestions ...

It should be pretty easy to estimate the requirements to run the front
end and do a nightly x86_64 build of guixSD, projected out 3 years. Do
we have a handle on what this is?

You should buy hardware that supports this and plan to discard it in 3
years.

Since things always break, visualize a system in which every server has
a redundant warm backup or is a pair of servers at 50% load.

You can choose between amazingly cheap used servers that guzzle power or
new servers that use less power. If you buy used computers ~ 3 years old
the total cost of ownership will be nearly a wash over ~ 3 years of
deployment. So, if you are cash rich, buy new computers. If you are
cash-poor, buy used computers, but don't buy anything more than 3 years
old, unless you want a computer museum ;-)

The benefit of a used server is that it comes assembled and tested and
probably has good driver support. Shiny new motherboards expose you to
the risk of unstable drivers and BIOS.  So, if you want new, you should
probably buy last year's model ;-)

In either case, it is tempting to assemble the computer. But this is not
a good idea because there is always some glitch.

The best strategy would be to buy assembled, tested servers with GuixSD
installed and running. If you can't find a vendor that will do that for
you, buy a test unit on the condition that the machine will be returned
if GuixSD doesn't install smoothly.

Specify RAIDed SSDs or, ideally, NVMe drives.

Acquire a test unit quickly. If it works great, buy another one (or
two)! If not, ditch it right away and try something else. Based on my
experience it is easy to install GuixSD on 3 year old intel-based
hardware.  And I expect it should be equally easy to install on 1 year
old intel-based hardware.

Finally, WRT the bayfront hardware, when you said, "That machine is
super powerful…" I guess this is relative to the existing hydra front
end and on the assumption that it is only used as a front end.

But this raises the obvious question, isn't it possible to specify a
bayfront sever that can also build "all of x86_64 Guix" in a day?  If
so, this would be simpler, wouldn't it? Then, if/when it becomes
overloaded you can supplement it with x86_64 build machines and "proxy"
front ends, can't you?

> For now, we prefer not to entrust the binaries we deliver to
> commercial VPS providers.  I think we owe it to our users, but it
> undoubtedly has a cost in terms of system maintenance.

Owning servers gives you more control and saves you time and money over
the long term. It will also improve the quality of the GuixSD "product"
offering for servers. So, IMO this is the right thing to do at this
time.

But it would dramatically legitimize GuixSD for some people if it were
also deployed on AWS. So this would probably be a good thing to
visualize for the future.

HTH - George

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-07  3:00     ` Guix infrastructure Leo Famulari
  2017-07-07 12:19       ` Ludovic Courtès
@ 2017-07-09  6:30       ` Efraim Flashner
  1 sibling, 0 replies; 15+ messages in thread
From: Efraim Flashner @ 2017-07-09  6:30 UTC (permalink / raw)
  To: Leo Famulari, myglc2; +Cc: guix-devel



On July 7, 2017 6:00:42 AM GMT+03:00, Leo Famulari <leo@famulari.name> wrote:
>On Thu, Jul 06, 2017 at 08:09:17PM -0400, myglc2 wrote:
>> On 07/01/2017 at 14:01 Leo Famulari writes:
>> > ... Bayfront is still not fully operational, so hydra.gnu.org is
>still
>> > serving as the front-end of the build farm. We are still relying on
>the
>> > Hydra software. That is, the situation is basically the same as
>before.
>> > Adding build machines will not help very much until the front-end
>> > hardware gets faster.
>> 
>> This leaves me wondering ...
>> 
>> Is the hydra/front-end hardware going to be upgraded?
>
>Yes...
>
>> Is bayfront/cuirass intended to replace hydra?
>
>... and yes.
>
>> The bayfront hardware described here ...
>> 
>> https://www.gnu.org/software/guix/news/growing-our-build-farm.html
>> 
>> ... seems weak to me. Is there a plan to scale it up and make it
>redundant?
>
>It will be a lot more powerful than the current Hydra system. As for
>specific plans, I'll let those administering the system chime in.
>
>> A reliable, resourced, managed, "nightly Guix build" should pay big
>> dividends for the project. But, from reading the lists, I get the
>> impression that such a thing does not exist. Is that correct?
>
>Currently, we tend to build all the packages as often as we can with
>our
>resources, which is less than once a day.
>
>> Do we know what would be needed to achieve a complete nightly build?
>
>It depends on what you mean by "complete".
>
>I doubt we can find armhf hardware that could build all the packages
>daily. That platform doesn't get very powerful in general and, in my
>experience, the machines that do exist can't handle sustained high
>loads, nor do they have fast network and I/O interfaces.
>
>It is possible for x86_64, i686, and eventually for aarch64. Maybe we
>will be able to cross-build from aarch64 to arhmf; I'm not sure.
>Efraim?

In theory it should be possible to build and run armhf packages on aarch64, in practice its not always the case. http://sjoerd.luon.net/posts/2017/07/debian-armhf-vm-on-arm64/ says:

On the 64 bit ARM side, we're running on Gigabyte MP30-AR1 based servers which can run 32 bit arm code (As opposed to e.g. ThunderX based servers which can only run 64 bit code). As such running armhf VMs on them to act as build slaves seems a good choice, but setting that up is a bit more involved than it might appear.


>
>Ricardo has been working on getting some new x86_64 / i686 builders
>online:
>
>https://gnunet.org/bot/log/guix/2017-06-30#T1433202

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-09  0:43         ` myglc2
@ 2017-07-09  8:49           ` Ricardo Wurmus
  2017-07-11 18:44             ` Catonano
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2017-07-09  8:49 UTC (permalink / raw)
  To: myglc2; +Cc: guix-devel


myglc2 <myglc2@gmail.com> writes:

>>>> The bayfront hardware described here ...
>>>>
>>>> https://www.gnu.org/software/guix/news/growing-our-build-farm.html
>>>>
>>>> ... seems weak to me. Is there a plan to scale it up and make it redundant?
>>>
>>> It will be a lot more powerful than the current Hydra system. As for
>>> specific plans, I'll let those administering the system chime in.
>>
>> That machine is super powerful…
>
> Well, I disagree. A 2010 motherboard with 2 x 2011 CPUs (16 core at
> 1.6GHz) is weak compared to modern servers.

The machine is sufficiently powerful for what it should do.  It just
doesn’t do that yet, because it crashes.  Bayfront is *not* the build
farm, it’s just the front-end of that build farm.

> As you have experienced here, the learning/deployment costs and hassle
> associated with each new type of server often dwarfs other costs. The
> best way to minimize this is to minimize the number of types of servers
> you own.

The servers I’m preparing to add to the build farm once I’m back to the
office are all of the same type.  There are dozens of them.

> At this point it makes sense to abandon the Vikings motherboard and
> choose a popular, mainstream, current x86_64 motherboard. Since AMD has
> not been a competitive server vendor for the last ~8 years this means,
> practically speaking, picking a popular intel-based motherboard.

There is a group of sysadmins in contact with Vikings and taking care of
the build farm.  I’d rather keep the discussions about how to move
forward with our servers there.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-08 23:50         ` ng0
@ 2017-07-09  9:21           ` Ricardo Wurmus
  2017-07-09 12:06             ` Liam Wigney
  0 siblings, 1 reply; 15+ messages in thread
From: Ricardo Wurmus @ 2017-07-09  9:21 UTC (permalink / raw)
  To: ng0; +Cc: guix-devel

Hi ng0,

> - master is not stable and it is not being treated as a high priority
>   problem

I don’t know where you get this from and I don’t appreciate the
insinuation that we don’t care.  The vast majority of commits to
“master” are totally fine.

As we don’t have the resources for maintaining a stable branch, “master”
is a best effort.

> - a bug in the compiler which is used in the core of Guix is bad.

We all agree here.  I don’t see the point of reiterating it.  The people
who can fix it are already working on it — in their own time and in
*addition* to all the things they regularly do.

Here’s a shout out to Ludo who tirelessly fixes old and new bugs,
implements new features, improves performance, deals with GSoC, and
answers community questions; to Andy Wingo who continuously improves
Guile performance, implements new Guix services, drafted and implemented
the potluck faster than I could blink, …; to Leo and Mark and Marius who
keep on top of security issues despite the fact that this is no fun; —
the list goes on and on.

Andy and Ludo are working on the Guile bug already.  I don’t see how
this can reasonably result in complaints.

>   In my
>   understanding that we could at least try to evade this by reducing the
>   module sizes is met with arguments like "this will be fixed in the
>   future, for now we can only split 1 module the rest has to stay
>   together for semantic and linguistic reasons".
>   If my understanding of the whole situation is wrong this is due to the
>   intransparent dealing with this serious problem and the way my idea
>   to temporarily fix it was met.

“Intransparent”?  I don’t know what else to say here.

Breaking up modules is *not* a fix, not even a temporary fix.  How would
this help when Guile never frees memory and the cumulative usage ends up
being the same?  This is something that needs to be fixed in Guile and
both Andy and Ludo have already spent time to investigate this and come
up with solutions.

I also wrote that splitting up (gnu packages python) is fine – yet I
have not seen a patch that would do this.  There’s only so much a single
person can do.

I’m skipping the rest of the complaints in this paragraph, because they
add nothing new and ignore the late night efforts of people in the Guix
and Guile communities.

> - Writing system services in Shepherd is hard.

I beg to differ.  If you have legitimate concerns please point out the
sections in the manuals that are unclear and propose changes.

> These are the major issues Guix could fix.

“Guix” is people.

Personally, I don’t want to spend more time on this discussion, because
I want to get back to getting things done that probably only few people
will see or notice, but which need to be done anyway.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-09  9:21           ` Ricardo Wurmus
@ 2017-07-09 12:06             ` Liam Wigney
  2017-07-09 22:57               ` ng0
  0 siblings, 1 reply; 15+ messages in thread
From: Liam Wigney @ 2017-07-09 12:06 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hey all,

While I'm aware it was mentioned that server power was mentioned as an issue, OpenQA might be of interest for automatic testing. 

> On 9 Jul 2017, at 6:51 pm, Ricardo Wurmus <rekado@elephly.net> wrote:
> 
> 
> Hi ng0,
> 
>> - master is not stable and it is not being treated as a high priority
>>  problem
> 
> I don’t know where you get this from and I don’t appreciate the
> insinuation that we don’t care.  The vast majority of commits to
> “master” are totally fine.
> 
> As we don’t have the resources for maintaining a stable branch, “master”
> is a best effort.
> 
>> - a bug in the compiler which is used in the core of Guix is bad.
> 
> We all agree here.  I don’t see the point of reiterating it.  The people
> who can fix it are already working on it — in their own time and in
> *addition* to all the things they regularly do.
> 
> Here’s a shout out to Ludo who tirelessly fixes old and new bugs,
> implements new features, improves performance, deals with GSoC, and
> answers community questions; to Andy Wingo who continuously improves
> Guile performance, implements new Guix services, drafted and implemented
> the potluck faster than I could blink, …; to Leo and Mark and Marius who
> keep on top of security issues despite the fact that this is no fun; —
> the list goes on and on.
> 
> Andy and Ludo are working on the Guile bug already.  I don’t see how
> this can reasonably result in complaints.
> 
>>  In my
>>  understanding that we could at least try to evade this by reducing the
>>  module sizes is met with arguments like "this will be fixed in the
>>  future, for now we can only split 1 module the rest has to stay
>>  together for semantic and linguistic reasons".
>>  If my understanding of the whole situation is wrong this is due to the
>>  intransparent dealing with this serious problem and the way my idea
>>  to temporarily fix it was met.
> 
> “Intransparent”?  I don’t know what else to say here.
> 
> Breaking up modules is *not* a fix, not even a temporary fix.  How would
> this help when Guile never frees memory and the cumulative usage ends up
> being the same?  This is something that needs to be fixed in Guile and
> both Andy and Ludo have already spent time to investigate this and come
> up with solutions.
> 
> I also wrote that splitting up (gnu packages python) is fine – yet I
> have not seen a patch that would do this.  There’s only so much a single
> person can do.
> 
> I’m skipping the rest of the complaints in this paragraph, because they
> add nothing new and ignore the late night efforts of people in the Guix
> and Guile communities.
> 
>> - Writing system services in Shepherd is hard.
> 
> I beg to differ.  If you have legitimate concerns please point out the
> sections in the manuals that are unclear and propose changes.
> 
>> These are the major issues Guix could fix.
> 
> “Guix” is people.
> 
> Personally, I don’t want to spend more time on this discussion, because
> I want to get back to getting things done that probably only few people
> will see or notice, but which need to be done anyway.
> 
> --
> Ricardo
> 
> GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> https://elephly.net
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-09 12:06             ` Liam Wigney
@ 2017-07-09 22:57               ` ng0
  0 siblings, 0 replies; 15+ messages in thread
From: ng0 @ 2017-07-09 22:57 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 4426 bytes --]

I have no time  at the moment for a full reply,
but I think we got off at the wrong foot Ricardo.

I guess you are trying to read between the lines
that I tried to be negative about everyones work.
I don't have any subtext.
What I could've done better is go more into detail.

Where did I get the impression of Guix and Guile being
intransparent about the bug? I opened a bug, saw one thread
on the guile mailinglist, and that's it. So given the fact
that I am not omniscient my only logical assumption was that
not much happened in public space.

The rest was run over, but we had some chats over the
weekend and our solution to our side of this is more
clear now. It's not yet ready to be published, but I'll
keep Guix in the loop.

Liam Wigney transcribed 3.4K bytes:
> Hey all,
> 
> While I'm aware it was mentioned that server power was mentioned as an issue, OpenQA might be of interest for automatic testing. 
> 
> > On 9 Jul 2017, at 6:51 pm, Ricardo Wurmus <rekado@elephly.net> wrote:
> > 
> > 
> > Hi ng0,
> > 
> >> - master is not stable and it is not being treated as a high priority
> >>  problem
> > 
> > I don’t know where you get this from and I don’t appreciate the
> > insinuation that we don’t care.  The vast majority of commits to
> > “master” are totally fine.
> > 
> > As we don’t have the resources for maintaining a stable branch, “master”
> > is a best effort.
> > 
> >> - a bug in the compiler which is used in the core of Guix is bad.
> > 
> > We all agree here.  I don’t see the point of reiterating it.  The people
> > who can fix it are already working on it — in their own time and in
> > *addition* to all the things they regularly do.
> > 
> > Here’s a shout out to Ludo who tirelessly fixes old and new bugs,
> > implements new features, improves performance, deals with GSoC, and
> > answers community questions; to Andy Wingo who continuously improves
> > Guile performance, implements new Guix services, drafted and implemented
> > the potluck faster than I could blink, …; to Leo and Mark and Marius who
> > keep on top of security issues despite the fact that this is no fun; —
> > the list goes on and on.
> > 
> > Andy and Ludo are working on the Guile bug already.  I don’t see how
> > this can reasonably result in complaints.
> > 
> >>  In my
> >>  understanding that we could at least try to evade this by reducing the
> >>  module sizes is met with arguments like "this will be fixed in the
> >>  future, for now we can only split 1 module the rest has to stay
> >>  together for semantic and linguistic reasons".
> >>  If my understanding of the whole situation is wrong this is due to the
> >>  intransparent dealing with this serious problem and the way my idea
> >>  to temporarily fix it was met.
> > 
> > “Intransparent”?  I don’t know what else to say here.
> > 
> > Breaking up modules is *not* a fix, not even a temporary fix.  How would
> > this help when Guile never frees memory and the cumulative usage ends up
> > being the same?  This is something that needs to be fixed in Guile and
> > both Andy and Ludo have already spent time to investigate this and come
> > up with solutions.
> > 
> > I also wrote that splitting up (gnu packages python) is fine – yet I
> > have not seen a patch that would do this.  There’s only so much a single
> > person can do.
> > 
> > I’m skipping the rest of the complaints in this paragraph, because they
> > add nothing new and ignore the late night efforts of people in the Guix
> > and Guile communities.
> > 
> >> - Writing system services in Shepherd is hard.
> > 
> > I beg to differ.  If you have legitimate concerns please point out the
> > sections in the manuals that are unclear and propose changes.
> > 
> >> These are the major issues Guix could fix.
> > 
> > “Guix” is people.
> > 
> > Personally, I don’t want to spend more time on this discussion, because
> > I want to get back to getting things done that probably only few people
> > will see or notice, but which need to be done anyway.
> > 
> > --
> > Ricardo
> > 
> > GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
> > https://elephly.net
> > 
> > 
> 

-- 
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://n0is.noblogs.org/my-keys
https://www.infotropique.org https://krosos.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Guix infrastructure
  2017-07-09  8:49           ` Ricardo Wurmus
@ 2017-07-11 18:44             ` Catonano
  0 siblings, 0 replies; 15+ messages in thread
From: Catonano @ 2017-07-11 18:44 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel, myglc2

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

Wow, I had totally missed this thread !

2017-07-09 10:49 GMT+02:00 Ricardo Wurmus <rekado@elephly.net>:

>
> myglc2 <myglc2@gmail.com> writes:
>
> > As you have experienced here, the learning/deployment costs and hassle
> > associated with each new type of server often dwarfs other costs. The
> > best way to minimize this is to minimize the number of types of servers
> > you own.
>
> The servers I’m preparing to add to the build farm once I’m back to the
> office are all of the same type.  There are dozens of them.
>

Ah, this is good news !

I can't wait :-)

[-- Attachment #2: Type: text/html, Size: 1055 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-07-11 18:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-01 17:36 stability of master - just QA and hydra is not enough ng0
2017-07-01 18:01 ` Leo Famulari
2017-07-01 19:24   ` ng0
2017-07-01 19:52     ` Leo Famulari
2017-07-07  0:09   ` myglc2
2017-07-07  3:00     ` Guix infrastructure Leo Famulari
2017-07-07 12:19       ` Ludovic Courtès
2017-07-08 23:50         ` ng0
2017-07-09  9:21           ` Ricardo Wurmus
2017-07-09 12:06             ` Liam Wigney
2017-07-09 22:57               ` ng0
2017-07-09  0:43         ` myglc2
2017-07-09  8:49           ` Ricardo Wurmus
2017-07-11 18:44             ` Catonano
2017-07-09  6:30       ` Efraim Flashner

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).