all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Christopher Baines <mail@cbaines.net>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 59078@debbugs.gnu.org
Subject: [bug#59078] [PATCH] lint: Split the derivation lint checker by system.
Date: Tue, 15 Nov 2022 08:35:42 +0000	[thread overview]
Message-ID: <87leocfsnm.fsf@cbaines.net> (raw)
In-Reply-To: <87h6z1puli.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> The ‘derivation’ checker was added for this purpose: making sure that a
>>> package’s derivation can be computed for all the supported systems.
>>> Previously it was easy to overlook that kind of breakage.
>>>
>>> I think it’s important to keep a ‘derivation’ checker that does this.
>>
>> What aspect of it do you think is important?
>
> I meant that it’s important to have a single ‘derivation’ checker that
> checks derivations for all the supported systems.  Packagers should be
> able to run ‘guix lint -c derivation PKG’ and be confident that it’s
> fine for all systems.

I think we can still keep that by adding support for grouping lint
checkers. So have a derivation group, instead of a single checker.

Maybe that'll change the command slightly to 'guix lint -g derivation
PKG', but I think that can be equivalent.

>>> Now, the memory consumption you report is unacceptable and this needs to
>>> be addressed.
>>>
>>> Most (all?) caches are now per-session (associated with
>>> <store-connection>).  Since ‘guix lint’ uses a single session, those
>>> caches keep growing because there’s no eviction mechanism in place.
>>>
>>> A hack like the one below should work around that.  Could you check how
>>> well it works for you?
>>
>> I tried in the Guix Data Service processing packages in chunks of 1000
>> plus closing the store connection after each batch,
>
> How was it implemented?  Was it after the caches came into
> <store-connection>?

I'm not sure what aspect of the implementation is important, but I think
it's working correctly. Closing the store connection wasn't very
easy. Previously there was a fresh store connection with each call to
inferior-eval-with-store, but for this test I close the connections in
%store-table and then clear the hash table between the
inferior-eval-with-store calls.

>> and that led to a heap size of 3090MiB. But this is still higher than
>> 1778MiB heap usage I got just by splitting the derivation linter.
>
> I didn’t take the time to do it, but it would be nice to see, with the
> patch I gave, how ‘guix lint -c derivations’ behaves.

I've put some numbers below, with no changes the last batch to finish
processing leaves the heap at 7755MiB [1], then Guile crashes after
that.

With the patch you sent, the heap size seems to stabilise at 4042MiB
[2]. It also crashes at the end due to the match block not matching '(),
but that's not important.

I also hacked the lint script to run the checkers in the same manor as
the Guix Data Service, so one checker at a time rather than one package
at a time. That led to a max heap size of 3505MiB [3].

By adding in batching (as the Guix Data Service already does), I think
it's possible to further reduce this to the 1778MiB number I give above.

Reducing the memory usage helps reduce the cost/improve the throughput
of loading data in to the Guix Data Service which is my primary
motivation here. I'm also not only concerned with reducing the peak
memory usage, but trying to have an implementation that'll gracefully
handle more systems being supported in the future.

It's for that second point that I think arranging the derivation linting
so that it's possible to process each system in turn is important for
the Guix Data Service, so that when new platforms are added, the memory
usage won't grow as much.


1: no batching, one derivation checker
1065.0 MiB
1409.0 MiB
2089.0 MiB
2297.0 MiB
2513.0 MiB
2705.0 MiB
3077.0 MiB
3373.0 MiB
3557.0 MiB
3661.0 MiB
3901.0 MiB
3997.0 MiB
4147.0 MiB
4491.0 MiB
4635.0 MiB
5899.0 MiB
7755.0 MiB

2: batches of 1000 with fresh store connection, one derivation checker
1057.0 MiB
1137.0 MiB
1481.0 MiB
1481.0 MiB
1481.0 MiB
1481.0 MiB
1697.0 MiB
1697.0 MiB
1697.0 MiB
1761.0 MiB
1761.0 MiB
1761.0 MiB
1937.0 MiB
1977.0 MiB
1985.0 MiB
3065.0 MiB
3633.0 MiB
4041.0 MiB
4042.0 MiB
4042.0 MiB
4042.0 MiB

3: multiple derivation checkers, batched by system
1297.0 MiB
1545.0 MiB
1873.0 MiB
2113.0 MiB
2353.0 MiB
2609.0 MiB
2961.0 MiB
3193.0 MiB
3505.0 MiB



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

  reply	other threads:[~2022-11-15  9:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-06 13:55 [bug#59078] [PATCH] lint: Split the derivation lint checker by system Christopher Baines
2022-11-07 17:37 ` Christopher Baines
2022-11-11 21:57 ` Ludovic Courtès
2022-11-13 17:27   ` Christopher Baines
2022-11-14 12:51     ` Ludovic Courtès
2022-11-15  8:35       ` Christopher Baines [this message]
2022-11-17 17:22         ` Ludovic Courtès
2022-11-15  9:03       ` zimoun
2023-01-27 17:48         ` Simon Tournier
2023-01-31 16:33           ` Ludovic Courtès
2023-02-01  9:47             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87leocfsnm.fsf@cbaines.net \
    --to=mail@cbaines.net \
    --cc=59078@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.