unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Update on bordeaux.guix.gnu.org
@ 2021-11-24  8:52 Christopher Baines
  2021-11-28 17:26 ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Baines @ 2021-11-24  8:52 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3436 bytes --]

Hey!

It's been 3 months since I sent the last update [1]. This email was
meant to go out on Friday, but it seems opensmtpd was broken on my
machine, so it got stuck.

1: https://lists.gnu.org/archive/html/guix-devel/2021-08/msg00075.html

First, some good things:

I've been doing some performance tuning, submitting builds is now more
parallelised, a source of slowness when fetching builds has been
addressed, and one of the long queries involved in allocating builds has
been removed, which also improved handling of the WAL (Sqlite write
ahead log).

There's also a few new features. Agents can be deactivated which means
they won't get any builds allocated. The coordinator now checks the
hashes of outputs which are submitted, a safeguard which I added because
the coordinator now also supports resuming the uploads of outputs. This
is particularly important when trying to upload large (> 1GiB) outputs
over slow connections.

I also added a new x86_64 build machine. It's a 4 core Intel NUC that I
had sitting around, but I cleaned it up and got it building things. This
was particularly useful as I was able to use it to retry building
guile@3.0.7, which is extremely hard to build [2]. This was blocking
building the channel instance derivations for x86_64-linux.

2: https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv

On the related subject of data.guix.gnu.org (which is the source of
derivations for bordeaux.guix.gnu.org, as well as a recipient of build
information), there have been a couple of changes. There was some web
crawler activity that was slowing data.guix.gnu.org down significantly,
NGinx now has some rate limiting configuration to prevent crawlers
abusing the service. The other change is that substitutes for the latest
processed revision of master will be queried on a regular basis, so this
page [3] should be roughly up to date, including for ci.guix.gnu.org.

3: https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-substitute-availability

Now for some not so good things:

Submitting builds wasn't working quite right for around a month, one of
the changes I made to speed things up led to some builds being
missed. This is now fixed, and all the missed builds have been
submitted, but this was more than 50,000 builds. This, along with all
the channel instance derivation builds that can now proceed mean that
there's a very large backlog of x86 and ARM builds which will probably
take at least another week to clear. While this backlog exists,
substitute availability for x86_64-linux will be lower than usual.

Space is running out on bayfront, the machine that runs the coordinator,
stores all the nars and build logs, and serves the substitutes. I knew
this was probably going to be an issue, bayfront didn't have much space
to begin with, but I had hoped I'd be further forward in developing some
way to allow moving the nars around between multiple machines, to remove
the need to store all of them on bayfront. I have got a plan, there's
some ideas I mentioned back in February [4], but I haven't got around to
implementing anything yet. The disk space usage trend is pretty much
linear, so if things continue without any change, I think it will be
necessary to pause the agents within a month, to avoid filling up
bayfront entirely.

4: https://lists.gnu.org/archive/html/guix-devel/2021-02/msg00104.html

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-11-24  8:52 Update on bordeaux.guix.gnu.org Christopher Baines
@ 2021-11-28 17:26 ` Ludovic Courtès
  2021-11-28 19:54   ` Ricardo Wurmus
  2021-12-03  9:39   ` Christopher Baines
  0 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-11-28 17:26 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hello,

Christopher Baines <mail@cbaines.net> skribis:

> I've been doing some performance tuning, submitting builds is now more
> parallelised, a source of slowness when fetching builds has been
> addressed, and one of the long queries involved in allocating builds has
> been removed, which also improved handling of the WAL (Sqlite write
> ahead log).
>
> There's also a few new features. Agents can be deactivated which means
> they won't get any builds allocated. The coordinator now checks the
> hashes of outputs which are submitted, a safeguard which I added because
> the coordinator now also supports resuming the uploads of outputs. This
> is particularly important when trying to upload large (> 1GiB) outputs
> over slow connections.
>
> I also added a new x86_64 build machine. It's a 4 core Intel NUC that I
> had sitting around, but I cleaned it up and got it building things. This
> was particularly useful as I was able to use it to retry building
> guile@3.0.7, which is extremely hard to build [2]. This was blocking
> building the channel instance derivations for x86_64-linux.
>
> 2: https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv

Neat!  (Though I wouldn’t say building Guile is “extremely hard”,
especially on x86_64.  :-))  The ability to keep retrying is much
welcome.

> On the related subject of data.guix.gnu.org (which is the source of
> derivations for bordeaux.guix.gnu.org, as well as a recipient of build
> information), there have been a couple of changes. There was some web
> crawler activity that was slowing data.guix.gnu.org down significantly,
> NGinx now has some rate limiting configuration to prevent crawlers
> abusing the service. The other change is that substitutes for the latest
> processed revision of master will be queried on a regular basis, so this
> page [3] should be roughly up to date, including for ci.guix.gnu.org.
>
> 3: https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-substitute-availability

That’s good news.  That also means that things like
<https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility>
should be more up-to-date, which is really cool!  This can have a
drastic impact in how we monitor and address reproducibility issues.

> Now for some not so good things:
>
> Submitting builds wasn't working quite right for around a month, one of
> the changes I made to speed things up led to some builds being
> missed. This is now fixed, and all the missed builds have been
> submitted, but this was more than 50,000 builds. This, along with all
> the channel instance derivation builds that can now proceed mean that
> there's a very large backlog of x86 and ARM builds which will probably
> take at least another week to clear. While this backlog exists,
> substitute availability for x86_64-linux will be lower than usual.

At least it’s nice to have a clear picture of which builds are missing,
how much of a backlog we have, and what needs to be rebuilt.

> Space is running out on bayfront, the machine that runs the coordinator,
> stores all the nars and build logs, and serves the substitutes. I knew
> this was probably going to be an issue, bayfront didn't have much space
> to begin with, but I had hoped I'd be further forward in developing some
> way to allow moving the nars around between multiple machines, to remove
> the need to store all of them on bayfront. I have got a plan, there's
> some ideas I mentioned back in February [4], but I haven't got around to
> implementing anything yet. The disk space usage trend is pretty much
> linear, so if things continue without any change, I think it will be
> necessary to pause the agents within a month, to avoid filling up
> bayfront entirely.

Ah, bummer.  I hope we can find a solution one way or another.
Certainly we could replicate nars on another machine with more disk,
possibly buying the necessary hardware with the project funds.

Thanks for the update!

Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-11-28 17:26 ` Ludovic Courtès
@ 2021-11-28 19:54   ` Ricardo Wurmus
  2021-12-01 17:42     ` Ludovic Courtès
  2021-12-03 10:17     ` Update " Christopher Baines
  2021-12-03  9:39   ` Christopher Baines
  1 sibling, 2 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2021-11-28 19:54 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Ludovic Courtès <ludo@gnu.org> writes:

>> The disk space usage trend is pretty much
>> linear, so if things continue without any change, I think it 
>> will be
>> necessary to pause the agents within a month, to avoid filling 
>> up
>> bayfront entirely.
>
> Ah, bummer.  I hope we can find a solution one way or another.
> Certainly we could replicate nars on another machine with more 
> disk,
> possibly buying the necessary hardware with the project funds.

Remember that I’ve got three 256G SSDs here that I could send to 
wherever bayfront now sits.  With LLVM or a RAID configuration 
these could just be added to the storage pool — if bayfront has 
sufficient slots for three more disks.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-11-28 19:54   ` Ricardo Wurmus
@ 2021-12-01 17:42     ` Ludovic Courtès
  2021-12-01 22:04       ` Ricardo Wurmus
  2021-12-03 10:17     ` Update " Christopher Baines
  1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-12-01 17:42 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> The disk space usage trend is pretty much
>>> linear, so if things continue without any change, I think it will
>>> be
>>> necessary to pause the agents within a month, to avoid filling up
>>> bayfront entirely.
>>
>> Ah, bummer.  I hope we can find a solution one way or another.
>> Certainly we could replicate nars on another machine with more disk,
>> possibly buying the necessary hardware with the project funds.
>
> Remember that I’ve got three 256G SSDs here that I could send to
> wherever bayfront now sits.  With LLVM or a RAID configuration 
> these could just be added to the storage pool — if bayfront has
> sufficient slots for three more disks.

Good to know.  In that case we’d need to come up with (1) an updated
Guix System config with LVM, and (2) a way to copy the existing store
over to the new storage, which sounds tricky if the existing disk is to
be kept.  (Also I think we’re down to 1.5 person who could go on
site. :-/)

Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-12-01 17:42     ` Ludovic Courtès
@ 2021-12-01 22:04       ` Ricardo Wurmus
  2021-12-06 12:51         ` Upgrading storage " Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Wurmus @ 2021-12-01 22:04 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi,

[space is running out on bayfront, so I wrote:]

> Remember that I’ve got three 256G SSDs here that I could send to
> wherever bayfront now sits.  With LVM or a RAID configuration 
> these could just be added to the storage pool — if bayfront has
> sufficient slots for three more disks.

You wrote in response:

> Good to know.  In that case we’d need to come up with (1) an 
> updated
> Guix System config with LVM, and (2) a way to copy the existing 
> store
> over to the new storage, which sounds tricky if the existing 
> disk is to
> be kept.

We could first install Guix System with the adjusted bayfront 
config on a separate machine (e.g. on a build node at the MDC), 
onto a volume with LVM (using as many of the SSDs as needed). 
Copy signing keys etc from bayfront.  Then we’d pretty much 
export/import the bayfront store over the network.  Once 
everything has been copied, we turn off bayfront, swap the disks, 
boot it up again.  If everything works all right we add the 
original disk (and any unused left-over disks) to the LVM volume 
to extend the storage pool.

The trickiest bit is to minimize the time between finishing the 
sync and swapping the disks.

> (Also I think we’re down to 1.5 person who could go on
> site. :-/)

Not great :-/

-- 
Ricardo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-11-28 17:26 ` Ludovic Courtès
  2021-11-28 19:54   ` Ricardo Wurmus
@ 2021-12-03  9:39   ` Christopher Baines
  1 sibling, 0 replies; 10+ messages in thread
From: Christopher Baines @ 2021-12-03  9:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> I've been doing some performance tuning, submitting builds is now more
>> parallelised, a source of slowness when fetching builds has been
>> addressed, and one of the long queries involved in allocating builds has
>> been removed, which also improved handling of the WAL (Sqlite write
>> ahead log).
>>
>> There's also a few new features. Agents can be deactivated which means
>> they won't get any builds allocated. The coordinator now checks the
>> hashes of outputs which are submitted, a safeguard which I added because
>> the coordinator now also supports resuming the uploads of outputs. This
>> is particularly important when trying to upload large (> 1GiB) outputs
>> over slow connections.
>>
>> I also added a new x86_64 build machine. It's a 4 core Intel NUC that I
>> had sitting around, but I cleaned it up and got it building things. This
>> was particularly useful as I was able to use it to retry building
>> guile@3.0.7, which is extremely hard to build [2]. This was blocking
>> building the channel instance derivations for x86_64-linux.
>>
>> 2: https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv
>
> Neat!  (Though I wouldn’t say building Guile is “extremely hard”,
> especially on x86_64.  :-))  The ability to keep retrying is much
> welcome.

To rephrase, I found it extremely hard to get that particular Guile
derivation to build successfully, it failed to build 12 times, and only
succeeded when I added new hardware to attempt on (I'm guessing the
particular issue I was encountering was exacerbated by more cores).

Unfortunately, I also think that you finding it easy to build actually
contributes to the problem here, since it makes finding and addressing
issues like this harder.

>> Space is running out on bayfront, the machine that runs the coordinator,
>> stores all the nars and build logs, and serves the substitutes. I knew
>> this was probably going to be an issue, bayfront didn't have much space
>> to begin with, but I had hoped I'd be further forward in developing some
>> way to allow moving the nars around between multiple machines, to remove
>> the need to store all of them on bayfront. I have got a plan, there's
>> some ideas I mentioned back in February [4], but I haven't got around to
>> implementing anything yet. The disk space usage trend is pretty much
>> linear, so if things continue without any change, I think it will be
>> necessary to pause the agents within a month, to avoid filling up
>> bayfront entirely.
>
> Ah, bummer.  I hope we can find a solution one way or another.
> Certainly we could replicate nars on another machine with more disk,
> possibly buying the necessary hardware with the project funds.

Since this email got a bit delayed when I sent it, things have moved on
a bit now.

90% disk usage was the threshold I had in mind for bayfront, and that's
now pretty much been reached so I've paused all the agents. My plans for
how to address this have also developed a bit as well, but it's still
going to take a month at least to get things going again.

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-11-28 19:54   ` Ricardo Wurmus
  2021-12-01 17:42     ` Ludovic Courtès
@ 2021-12-03 10:17     ` Christopher Baines
  2021-12-03 11:18       ` Ricardo Wurmus
  1 sibling, 1 reply; 10+ messages in thread
From: Christopher Baines @ 2021-12-03 10:17 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]


Ricardo Wurmus <rekado@elephly.net> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> The disk space usage trend is pretty much
>>> linear, so if things continue without any change, I think it will
>>> be
>>> necessary to pause the agents within a month, to avoid filling up
>>> bayfront entirely.
>>
>> Ah, bummer.  I hope we can find a solution one way or another.
>> Certainly we could replicate nars on another machine with more disk,
>> possibly buying the necessary hardware with the project funds.
>
> Remember that I’ve got three 256G SSDs here that I could send to
> wherever bayfront now sits.  With LLVM or a RAID configuration
> these could just be added to the storage pool — if bayfront has
> sufficient slots for three more disks.

While it would be nice for bayfront to have an SSD, it might actually be
more valuable to use those for some of the machines that do more of the
building.

harbourfront currently has a broken hard drive (I believe), and
milano-guix-1 has some slow hard drives that impede it building
things. I've CC'ed Andreas as I think he knows more about harbourfront,
and I'll follow up about milano-guix-1 off list.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Update on bordeaux.guix.gnu.org
  2021-12-03 10:17     ` Update " Christopher Baines
@ 2021-12-03 11:18       ` Ricardo Wurmus
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2021-12-03 11:18 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel


Christopher Baines <mail@cbaines.net> writes:

> [[PGP Signed Part:Undecided]]
>
> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>>> The disk space usage trend is pretty much
>>>> linear, so if things continue without any change, I think it will
>>>> be
>>>> necessary to pause the agents within a month, to avoid filling up
>>>> bayfront entirely.
>>>
>>> Ah, bummer.  I hope we can find a solution one way or another.
>>> Certainly we could replicate nars on another machine with more disk,
>>> possibly buying the necessary hardware with the project funds.
>>
>> Remember that I’ve got three 256G SSDs here that I could send to
>> wherever bayfront now sits.  With LLVM or a RAID configuration
>> these could just be added to the storage pool — if bayfront has
>> sufficient slots for three more disks.
>
> While it would be nice for bayfront to have an SSD, it might actually be
> more valuable to use those for some of the machines that do more of the
> building.
>
> harbourfront currently has a broken hard drive (I believe), and
> milano-guix-1 has some slow hard drives that impede it building
> things. I've CC'ed Andreas as I think he knows more about harbourfront,
> and I'll follow up about milano-guix-1 off list.

Okay, thank you.

Note that these disks are (nominally) 250G each.  The spinning platter
disks might be larger than that, so this should be taken into account
when replacing disks.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Upgrading storage on bordeaux.guix.gnu.org
  2021-12-01 22:04       ` Ricardo Wurmus
@ 2021-12-06 12:51         ` Ludovic Courtès
  2021-12-07 19:26           ` Maxim Cournoyer
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-12-06 12:51 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hello!

(+Cc: Andreas.)

Ricardo Wurmus <rekado@elephly.net> skribis:

> [space is running out on bayfront, so I wrote:]
>
>> Remember that I’ve got three 256G SSDs here that I could send to
>> wherever bayfront now sits.  With LVM or a RAID configuration these
>> could just be added to the storage pool — if bayfront has
>> sufficient slots for three more disks.
>
> You wrote in response:
>
>> Good to know.  In that case we’d need to come up with (1) an updated
>> Guix System config with LVM, and (2) a way to copy the existing
>> store
>> over to the new storage, which sounds tricky if the existing disk is
>> to
>> be kept.
>
> We could first install Guix System with the adjusted bayfront config
> on a separate machine (e.g. on a build node at the MDC), onto a volume
> with LVM (using as many of the SSDs as needed). Copy signing keys etc
> from bayfront.  Then we’d pretty much export/import the bayfront store
> over the network.  Once everything has been copied, we turn off
> bayfront, swap the disks, boot it up again.  If everything works all
> right we add the original disk (and any unused left-over disks) to the
> LVM volume to extend the storage pool.

Sounds like a plan.  But note that there’s the store and there’s the
cached nars, though maybe we can tolerate missing, say, a week or two of
nars.

It would be more convenient to do that with a machine already in the
vicinity of bayfront though, so we can more easily move the disks there
when we’re ready.  Maybe we could use a machine at Inria or the math
institute next door.  Andreas, WDYT?  (We can work out the details
off-list.)

>> (Also I think we’re down to 1.5 person who could go on
>> site. :-/)
>
> Not great :-/

Here’s a call: if you’re in the whereabouts of Bordeaux, France, and
would like to help, please get in touch with Andreas and myself!  We
need to increase our tramway factor.

Cheers,
Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Upgrading storage on bordeaux.guix.gnu.org
  2021-12-06 12:51         ` Upgrading storage " Ludovic Courtès
@ 2021-12-07 19:26           ` Maxim Cournoyer
  0 siblings, 0 replies; 10+ messages in thread
From: Maxim Cournoyer @ 2021-12-07 19:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hello,

Ludovic Courtès <ludo@gnu.org> writes:

> Hello!
>
> (+Cc: Andreas.)
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> [space is running out on bayfront, so I wrote:]
>>
>>> Remember that I’ve got three 256G SSDs here that I could send to
>>> wherever bayfront now sits.  With LVM or a RAID configuration these
>>> could just be added to the storage pool — if bayfront has
>>> sufficient slots for three more disks.
>>
>> You wrote in response:
>>
>>> Good to know.  In that case we’d need to come up with (1) an updated
>>> Guix System config with LVM, and (2) a way to copy the existing
>>> store
>>> over to the new storage, which sounds tricky if the existing disk is
>>> to
>>> be kept.
>>
>> We could first install Guix System with the adjusted bayfront config
>> on a separate machine (e.g. on a build node at the MDC), onto a volume
>> with LVM (using as many of the SSDs as needed). Copy signing keys etc
>> from bayfront.  Then we’d pretty much export/import the bayfront store
>> over the network.  Once everything has been copied, we turn off
>> bayfront, swap the disks, boot it up again.  If everything works all
>> right we add the original disk (and any unused left-over disks) to the
>> LVM volume to extend the storage pool.
>
> Sounds like a plan.  But note that there’s the store and there’s the
> cached nars, though maybe we can tolerate missing, say, a week or two of
> nars.

I don't know anything about the Bayfront machine specifics, but if it
was running with a Btrfs file system, it could be extendable live by
adding the new drives to it in Btrfs RAID0 configuration.

Cheers,

Maxim


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-07 19:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24  8:52 Update on bordeaux.guix.gnu.org Christopher Baines
2021-11-28 17:26 ` Ludovic Courtès
2021-11-28 19:54   ` Ricardo Wurmus
2021-12-01 17:42     ` Ludovic Courtès
2021-12-01 22:04       ` Ricardo Wurmus
2021-12-06 12:51         ` Upgrading storage " Ludovic Courtès
2021-12-07 19:26           ` Maxim Cournoyer
2021-12-03 10:17     ` Update " Christopher Baines
2021-12-03 11:18       ` Ricardo Wurmus
2021-12-03  9:39   ` Christopher Baines

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).