From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id mDyAN130Al8mMgAA0tVLHw (envelope-from ) for ; Mon, 06 Jul 2020 09:52:29 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id EJ1bM130Al/2HAAA1q6Kng (envelope-from ) for ; Mon, 06 Jul 2020 09:52:29 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C4B7E940428 for ; Mon, 6 Jul 2020 09:52:28 +0000 (UTC) Received: from localhost ([::1]:59740 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jsNnX-0006D8-OU for larch@yhetil.org; Mon, 06 Jul 2020 05:52:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51040) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jsNnT-0006A2-GR for gwl-devel@gnu.org; Mon, 06 Jul 2020 05:52:24 -0400 Received: from sender4-of-o51.zoho.com ([136.143.188.51]:21113) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jsNnP-0002Fw-JF for gwl-devel@gnu.org; Mon, 06 Jul 2020 05:52:23 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1594029129; cv=none; d=zohomail.com; s=zohoarc; b=RkRqXO09ns+A58203TFb+UvFYh7af7wm5ueWGNWmnDqbp4nTWmKtlXV2MZun4NPOKdjwJ/XYfLRPMzUiIa2XBTgM161OB/fvxvC4E8f23yoeSb4GfxX7dLsNjrdmXghZeBl5f+7Mn2PlR2BWOd4UzuzgVgbxv0gKQuL79vpbzrg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1594029129; h=Content-Type:Content-Transfer-Encoding:Date:From:MIME-Version:Message-ID:Subject:To; bh=Aq/KiqPkV6NMP6vPzSlZYvsDg+iu12fVg+wqVxc+qAU=; b=dRxnWJzIfidpnKrjfLvzVf+aT8dqjLfVTrAUQBe0ZGxdXuDDB1DQb/OKBoYDra/t7CfNoLyK7DehhRJK7h67CtXRwbwUZxgXwNfSSJMesUpBe0cnlUQE1yD22zGL4KpUCG/14qFF8WtfFl0/xJu16Z0Kl0MmZgh76x4ApZJWdfI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1594029129; s=zoho; d=elephly.net; i=rekado@elephly.net; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; bh=Aq/KiqPkV6NMP6vPzSlZYvsDg+iu12fVg+wqVxc+qAU=; b=UdgVr5+rtw0joAr+wOQyk3e7R0f7VDprkZqrXSMnR9iYzaOFR1eEr/aKZhwelo+j 58OrYFycurWGVXebbG4nZcgOvIvK3GNeGLydb+hBRS6XnD+13i400pbx+XXgvo7OjgG B/zyVL8MjQHlVXFSOZ1oHwGwNnMtr/xKOxgzQSY0= Received: from localhost (p54ad4de6.dip0.t-ipconnect.de [84.173.77.230]) by mx.zohomail.com with SMTPS id 1594029127907573.6484987424377; Mon, 6 Jul 2020 02:52:07 -0700 (PDT) User-agent: mu4e 1.4.10; emacs 26.3 From: Ricardo Wurmus To: gwl-devel@gnu.org Subject: fastest way to run a GWL workflow on AWS X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Mon, 06 Jul 2020 11:52:04 +0200 Message-ID: <87a70dkm2j.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External Received-SPF: pass client-ip=136.143.188.51; envelope-from=rekado@elephly.net; helo=sender4-of-o51.zoho.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/06 05:52:11 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: gwl-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gwl-devel-bounces+larch=yhetil.org@gnu.org Sender: "gwl-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=elephly.net header.s=zoho header.b=UdgVr5+r; dmarc=none; spf=pass (aspmx1.migadu.com: domain of gwl-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=gwl-devel-bounces@gnu.org X-Spam-Score: -2.21 X-TUID: 44sTCWgw1/jT Hey there, I had an idea to get a GWL workflow to run on AWS without having to mess with Docker and all that. GWL should do all of these steps when AWS deployment is requested: * create an EFS file system. Why EFS? Unlike EBS (block storage) and S3, one EFS can be accessed simultaneously by different virtual machines (EC2 instances). * sync the closure of the complete workflow (all steps) to EFS. (How? We could either mount EFS locally or use an EC2 instance as a simple =E2=80=9Ccloud=E2=80=9D file server.) This differs from how other workflo= w languages handle things. Other workflow systems have one or more Docker image(s) per step (sometimes one Docker image per application), which means that there is some duplication and setup time as Docker images are downloaded from a registry (where they have previously been uploaded). Since Guix knows the closure of all programs in the workflow we can simply upload all of it. * create as many EC2 instances as requested (respecting optional grouping information to keep any set of processes on the same node) and mount the EFS over NFS. The OS on the EC2 instances doesn=E2=80=99t matter. * run the processes on the EC2 instances (parallelizing as far as possible) and have them write to a unique directory on the shared EFS. The rest of the EFS is used as a read-only store to access all the Guix-built tools. The EFS either stays active or its contents are archived to S3 upon completion to reduce storage costs. The last two steps are obviously a little vague; we=E2=80=99d need to add a= few knobs to allow users to easily tweak resource allocation beyond what the GWL currently offers (e.g. grouping, mapping resources to EC2 machine sizes.) To implement the last step we would need to keep track of step execution. We can already do this, but the complication here is to effect execution on the remote nodes. I also want to add optional reporting for each step. There could be a service that listens to events and each step would trigger events to indicate start and stop of each step. This could trivially be visualized, so that users can keep track of the state of the workflow and its processes, e.g. with a pretty web interface. For the deployment to AWS (and eventual tear-down) we can use Guile AWS. None of this depends on =E2=80=9Cguix deploy=E2=80=9D, which I think would = be a poor fit as these virtual machines are meant to be disposable. Another thing I=E2=80=99d like to point out is that this doesn=E2=80=99t le= ad users down the AWS rabbit hole. We don=E2=80=99t use specialized AWS services like th= eir cluster/grid service, nor do we use Docker, nor ECS, etc. We use the simplest resource types: plain EC2 and boring NFS storage. This looks like one of the simplest remote execution models, which could just as well be used with other remote compute providers (or even a custom server farm). One of the open issues is to figure out how to sync the /gnu/store items to EFS efficiently. I don=E2=80=99t really want to shell out to rsync, nor= do I want to use =E2=80=9Cguix copy=E2=80=9D, which would require a remote insta= llation of Guix. Perhaps rsync would be the easiest route for a rough first draft. It would also be nice if we could deduplicate our slice of the store to cut down on unnecessary traffic to AWS. What do you think about this? --=20 Ricardo