From: Simon Tournier <zimon.toutoune@gmail.com>
To: 65787@debbugs.gnu.org
Subject: bug#65787: time-machine is doing too much network requests
Date: Wed, 06 Sep 2023 18:26:18 +0200 [thread overview]
Message-ID: <87wmx3mfo5.fsf@gmail.com> (raw)
Hi,
Well, I am in a quest to make Guix more robust for the worst-case
scenario: Savannah is unreachable (as well as other servers). For
context, it matters when using Guix for reproducing scientific
production. An example of such worst-case, see [1]. Well, this quote:
The first annoyance is that guix time-machine needs an access to the
server git.savannah.gnu.org, although the Git repository is already
cloned and already contains the required commit.
is almost tackled by #65352; at least tracked. :-)
Investigating that, I am noticed that the current design is suboptimal,
IMHO. I am reporting here and I hope to improve the situation by
reducing the number of network requests.
It matters in worst-case scenario of scientific production. And it also
matters for people with poor or unstable network link.
Sorry if the report is hard to follow, I did my best for being clear.
To keep the discussion simple, I only consider the Git reference
specification ’branch’ and ’tag-or-commit’. These Git reference
specification that various internal procedures are using is poorly
documented. See the docstring of the procedure ’update-cached-checkout’
from (guix git) for an idea or the implementation of ’resolve-reference’
for the complete list.
Let consider only the Git reference specifications:
(branch . "string")
(tag-or-commit . "string")
because that are what “guix time-machine” sets from the CLI or reads
from channels.scm files, IIUC.
The command “guix time-machine” starts to call ’cached-channel-instance’
passing as argument the procedure ’validate-guix-channel’.
This procedure ’cached-channel-instance’ starts by collecting all the
commits for each channel. It maps the channels list using the procedure
’channel-full-commit’. And that procedure calls
’update-cached-checkout. (1)
Then, ’cached-channel-instance’ calls ’validate-guix-channel’. And this
procedure also calls ’update-cached-checkout’. (2)
Then, ’cached-channel-instance’ calls ’latest-channel-instances’ which
calls ’latest-channel-instance’. And guess what, this procedure also
calls ’update-cached-checkout’. (3)
Ok, let give a look at ’update-cached-checkout’.
This procedure ’update-cached-checkout’ first looks if the Git reference
specification is already in the cached Git checkout using the procedure
’reference-available?’.
Consider that the Git reference specification is (branch . "some"), then
’reference-available?’ returns #false, so it triggers ’remote-fetch’
from Guile-Git. If I read correctly, this generates network traffic and
Savannah needs to be reachable. (I)
Hum, I am not convinced someone is following. Who knows? :-)
Let continue. ’update-cached-checkout’ starts to check some commit
relation and friends. There is an if-branch calling then
’switch-to-ref’ else ’resolve-reference’. Under the hood, the procedure
’switch-to-ref’ is calling ’resolve-reference’.
For the case (branch . "some"), this ’resolve-reference’ procedure calls
’branch-lookup’ from Guile-Git. If I read correctly, this generates
network traffic because of BRANCH-REMOTE and Savannah needs to be
reachable. (II)
Summary: ( (1) + (2) + (3) ) * ( (I) + (II) ) = 6.
If I am correct and if I am not missing something, the current design
requires 6 network traffic with Savannah and most of this traffic is
useless because it had already be done, somehow.
Well, (branch . "some") is the worst case, IMHO. And the short commit
ID (tag-or-commit . "1234abc") or the tag (tag-or-commit . "v1.4.0")
too.
Applying my proposal from #65352 (DRAFT v2), it removes some useless
’remote-fetch’ calls.
Well, let me know if this diagnostic is correct.
To be continued…
Cheers,
simon
1: https://simon.tournier.info/posts/2023-06-23-hackathon-repro.html
next reply other threads:[~2023-09-06 16:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 16:26 Simon Tournier [this message]
2023-09-10 20:10 ` bug#65787: time-machine is doing too much network requests Ludovic Courtès
2023-09-11 9:41 ` Simon Tournier
2023-09-11 11:36 ` Simon Tournier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wmx3mfo5.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=65787@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).