From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id 4AH9IPQtLGWELgEAG6o9tA:P1 (envelope-from ) for ; Sun, 15 Oct 2023 20:22:44 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id 4AH9IPQtLGWELgEAG6o9tA (envelope-from ) for ; Sun, 15 Oct 2023 20:22:44 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 236DE703DD for ; Sun, 15 Oct 2023 20:22:44 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P8Ox8dfR; spf=pass (aspmx1.migadu.com: domain of "help-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="help-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1697394164; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=RMyzq07OTnCQo6nDFRbosaCIDWSAbh4c73AXGs9x3ag=; b=G6sNu8Dr7FTErT7cvPmq1F9P0jVhWselRYoJwMTxXpOgGSRPEw82mNpKylxHx8ToOyNzi6 nsIULzjZEVved9YmWUcyfWK6dQTrT3dqftWt1znxhQ5UzsRvSwtDH+gUHqaK2fMSoIK9SL eVcqpEq36UWqMvQABNnXp/Ev4eGylC0oAFagoKL6dGEgZfeMDllhW4D0ztSfTGBF9tJVzh 0kslfwnr7NU6389XZiwy6V3hfyeR65JkKGhTRhDjAN0XZHIjQASZcnaC6+IKsT6++OL941 JtP9dX+tMkoe9Y5wfCEzvLoM4IfCYd8j2/qqeXQpi18Kwg7N6kLfFOCBYGwDDg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1697394164; a=rsa-sha256; cv=none; b=HH7GtehgPAhdVlmlkYOOqget5/O8chc1ysomSebXxdEAnSON8JzpbKz6fPMfzO9sQokuRP 2KdYx9DiBNLm9YnBoHal1D/xF8yw0reZPyIetrKjw9jfU9PT8JTB85ZvAJB6l57J5C5dcC lQ4qP8MsBl4kAfZJqjdSbxznmfARz4lLv+F1f3VbwkbHtjHNd4crtsCGR9ASL3Re6Yohhn ew/GeODl4G3oG88GtfAsXsOhIl/nymaFQQW/kYYzsA0/XT+0UQPtjmNOR9bvhmmHMyPsnc DRSHmVLUhlzZcZ5xD7rBbRXD8f4TztqjawQWlj/gmUzEe8+WR5JxUAVGaIkgVA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P8Ox8dfR; spf=pass (aspmx1.migadu.com: domain of "help-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="help-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qs5kl-0003Ws-I4; Sun, 15 Oct 2023 14:22:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qs5kj-0003WP-Vr for help-guix@gnu.org; Sun, 15 Oct 2023 14:22:14 -0400 Received: from mail-wm1-x330.google.com ([2a00:1450:4864:20::330]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qs5kh-0006hz-W6 for help-guix@gnu.org; Sun, 15 Oct 2023 14:22:13 -0400 Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-406609df1a6so36927405e9.3 for ; Sun, 15 Oct 2023 11:22:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697394130; x=1697998930; darn=gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RMyzq07OTnCQo6nDFRbosaCIDWSAbh4c73AXGs9x3ag=; b=P8Ox8dfRsBdX/b2fLbqIZAwx3Pu00k5m4RzwqKhkZ7Kn1WZppeDjv/wbR0mVa5icjH h3okCQ1ETXMyUG+g+kuo88SB3Cz3mInnAht8Xoi8AfoOXEamUq1FUETCvkAt/OVRBS32 kxfyW3uGAnnFwoEJL40R3Dw6Ms4BBt5CKTlrajZXl9qVc1s+ZzHdcrjn3mywnXApufET 3STRJl4Q5qRaE10vX79NVvVBOlDZV7eVIgoYZUyC9Wjg/AY0ri8XXItJ2EJ7+JIoKcG9 +19Hm9rmEgqwx9c03Bntn3Sy8B+pLkpGaoHJHlBkXKoo5FCPtasJgVpMPrtqvIrVMOfJ U2pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697394130; x=1697998930; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RMyzq07OTnCQo6nDFRbosaCIDWSAbh4c73AXGs9x3ag=; b=Db4awDupuApL90Xpi7YpSRaAJ5SogMfqx4KTaWld9BS9dekNsyQ/NvGk84GhD7GZ5q G3/VsOU1XRMbJsG4/GwClwhKLq8mzT/IaLpzZ2C6cuogAzt8ZpXiPZbs9IxXe9Xm2m9S IJ71RT85iPuNHUYmQtuZic6LcekwZQfq9UllPMKYZd7X2ApH343ozhJ8AAQLGXjfWbv5 HmRAug4q6b/wR06zA6mSLBmhmnKvQ5tCzNYRYdcav7py5YvR19lnxC7Ew6tG5geGOcNf 6L6BbqmRSado01xgejcxWVq007j2PrNA4ZH776L1lYFzIVpq9e1HlmC3OYXWBTwjOkpb XX0g== X-Gm-Message-State: AOJu0Yw0QkQ+s1D1xbRCDt39GnndBTR7NRJMj4KqKWS6mL1Eegkb7QN1 EYGWRdozdYLGdfhyWbXKSbtBjfL2zE4RghuwDIv/23bh X-Google-Smtp-Source: AGHT+IFcXijVM2dc94Y0dFCLlJFUv3pv8nHJIhRG/aZ807ebe5P5Yu0hGbs/XdW8akrlK6/OTRL9B2VjJozYh5qZDgc= X-Received: by 2002:a1c:f202:0:b0:407:5ad0:ab5b with SMTP id s2-20020a1cf202000000b004075ad0ab5bmr12751530wmc.8.1697394129942; Sun, 15 Oct 2023 11:22:09 -0700 (PDT) MIME-Version: 1.0 References: <87jzrse047.fsf@cbaines.net> <8734ye965y.fsf@cbaines.net> <20231013190552.65c8ddb4@jrhaighs-debian-x200> In-Reply-To: <20231013190552.65c8ddb4@jrhaighs-debian-x200> From: Josh Marshall Date: Sun, 15 Oct 2023 14:21:59 -0400 Message-ID: Subject: =?UTF-8?Q?Re=3A_Architecture_to_reduce_download_time_when_pullin?= =?UTF-8?Q?g_multiple_packages_=E2=80=93_historic_success_with_magnet_URLs=2C_B?= =?UTF-8?Q?TIHs=2C_=26_Aria2c=21?= To: "James R. Haigh (+ML.GNU.Guix subaddress)" Cc: help-guix@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::330; envelope-from=joshua.r.marshall.1991@gmail.com; helo=mail-wm1-x330.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-guix@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-guix-bounces+larch=yhetil.org@gnu.org Sender: help-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Spam-Score: -6.48 X-Migadu-Queue-Id: 236DE703DD X-Migadu-Scanner: mx0.migadu.com X-Migadu-Spam-Score: -6.48 X-TUID: j5BqL4XhiikG So it sounds like my first steps are to re-implement the downloads using aria2c. This would affect the minimum base package, no? Can I get some buy-in from maintainers that such changes are acceptable? On Fri, Oct 13, 2023 at 2:06=E2=80=AFPM James R. Haigh (+ML.GNU.Guix subaddress) wrote: > > Hi Josh, > > At Z-0400=3D2023-10-13Fri12:36:01, Josh Marshall sent: > > This is to parallelize connections which should never hurt downloading = but can help. Mirroring would be parallelizing for providing packages, wha= t I want to implement is to parallelize obtaining packages. Server side vs= client side. > > Please, if you are going to do something like this, please use a = torrent architecture like BitTorrent or GNUnet =E2=80=93 I suggest Aria2c a= s a very good CLI download backend that can be daemonised and sent instruct= ions over a socket to add, pause, remove downloads, etc., and it supports m= agnet URLs including the existing nontorrent servers (via =E2=80=98as=E2=80= =99 parameters, iirc.). > > I actually implemented this in a local copy of APT Daemon many ye= ars ago (circa 2011), but the change was not accepted upstream to Launchpad= (because I was not on bleeding-edge; I was too slow to keep-up with the up= stream development). My fork got forgotten about, because to get the full = benefit the server would have had to have added a BitTorrent Info Hash (BTI= H) to the metadata of each package, along with the MD5, SHA-256, etc. that = it already did (not a big ask, really). That said, without the full benefi= t of having the metadata, it did provide immediate benefit and I used it fo= r many years, not upgrading my Ubuntu 11.04 Natty Narwhal that I was using = back then until I really had to. > > The immediate benefit that it provided was exactly as you describ= ed: It allowed parallelisation of nontorrent downloads, be it from the same= server or from multiple mirrors. Iirc., I achieved this by simply passing= the download list to Aria2c in daemon mode, I think I also converted all t= he HTTP URLs to =E2=80=98as=E2=80=99 parameters in magnet links, so that mu= ltiple mirrors could be passed using multiple =E2=80=98as=E2=80=99 paramete= rs in each magnet link. Then I simply relied on Aria2c being amazing at pa= rallelising everything that I had given it! I then also implemented progre= ss updates such that APT Daemon could reflect where Aria2c was up to. > > The way I implemented this using Aria2c and magnet URLs meant tha= t if additional hashes were known, they could be used as well, and so if th= e server metadata made the simple addition of adding BTIHs, it allows swarm= ing to occur, which in-turn would massively reduce load on the central serv= ers, and allow anyone who want to be a mirror to be a mirror simply by seed= ing indefinitely. A default share ratio of 1.0 means that no user is a bur= den on the network, unless they deliberately change that. Users can donate= to the running costs of the project simply by increasing their share ratio= , which adds another means of contribution that they may find easier than t= he others. > > Anyone keen to keep old packages online can simply seed them inde= finitely, so this is also really great for archival purposes. Even if the = central project loses interest in the old packages and deletes them, anyone= else can keep them up. The hashes ensure that they have not been tampered= with. > > There is also a really cool benefit that occurs, or can occur, on= a LAN. An entire network of computers can all swarm locally with each oth= er, thus needing each package to only need downloading through the metered = last mile bottleneck from the WAN precisely once =E2=80=93 providing that l= ocal broadcasting is supported. I think this requires Avahi, and I seem to= remember that Aria2c supports this but I can't remember. I don't ever rem= ember getting this bit working but also I did not try hard because it would= have required the metadata that I didn't have until after download, so eve= n if I got it working it would not have been directly useful unless the APT= repositories that I was using would include the BTIHs. > > So yeah, loads of great benefits to this architecture, and I high= ly-recommend it: convert all existing URLs to magnet links (can be done cli= ent-side as I did; or server-side); optionally add any additional mirrors a= s additional =E2=80=98as=E2=80=99 parameters (again client-side or server-s= ide); add =E2=80=98btih=E2=80=99 parameters to the magnet links (the BTIH m= ust be included in the server metadata to get the full benefit of the swarm= ing, but conversion to magnet link format can be done client-side or server= -side); then simply pass all this to a really good parallelising backend su= ch as Aria2c; then update any progress data and relay pause, resume, cancel= , etc. to the backend. > > One final note, as I am sure that there are a lot of GNUnet fans = on this list, is that I would try Aria2c first to see how well it can work,= and then try GNUnet or whatever else once you have a standard to benchmark= against. Both are Free Software, so no concern there. Aria2c is an all-r= ound download manager CLI that works with or without swarming, i.e. it is j= ust as good at HTTPS as it is BitTorrent, and can do both at the same time.= GNUnet has the advantage of working from SHA-256 iirc., which is generall= y already included in the metadata of the repositories of various distribut= ions, but I think it lacks a lot of other features and stability and ecosys= tem of alternative backends, compared to the BitTorrent network. > > Of course, there is no harm in including other hashes along with = BTIH, to allow people to experiment with alternative backends, while always= ensuring that what works works well. Another hash that may be useful to i= nclude is the Tiger Tree Hash, which is structurally very similar to BTIH, = but stronger, iirc.. > > The first thing that the Guix project can do to signal interest i= n this architecture is to simply include the BTIH of each package in the re= pository metadata. Be it in magnet URL form or not does not matter because= the client can later convert that as needed. The important thing is an au= thoritative statement in metadata that this version of this package has thi= s BTIH. Once that metadata is available, the game is on to implement swarm= ing support, be it with Aria2c as a backend (as I recommend at least starti= ng with) or otherwise. > > I know that this architecture works well out of first-hand experi= ence with APT Daemon written in Python. The only failure I had with it was= lack of upstream support. So I consider it important to first attain the = upstream approval before really investing more time into this. I seem to r= emember suggesting this to the Nix project many years ago and didn't get an= ywhere, and now I don't have the energy to try to improve upstream projects= if they reject my ideas, so I'll be interested to see whether you have any= success with your attempt to do the same. > > Good luck! ;-) > > Kind regards, > James. > -- > Wealth doesn't bring happiness, but poverty brings sadness. > Sent from Debian with Claws Mail, using email subaddressing as an alterna= tive to error-prone heuristical spam filtering. > Postal: James R. Haigh, Middle Farm, Vennington, nr. Westbury, nr. Shrews= bury, Salop, SY5 9RG, Britain