From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:470:142:3::10]:53042) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0mCu-0003es-8k for gwl-devel@gnu.org; Sun, 09 Feb 2020 08:01:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0mCs-0000qE-SI for gwl-devel@gnu.org; Sun, 09 Feb 2020 08:01:03 -0500 References: <87h801p818.fsf@elephly.net> <87h800u84z.fsf@kyleam.com> <87a75spx3i.fsf@elephly.net> From: Ricardo Wurmus Subject: Re: Preparing for a new release In-reply-to: <87a75spx3i.fsf@elephly.net> Date: Sun, 09 Feb 2020 14:00:53 +0100 Message-ID: <877e0vq5iy.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gwl-devel-bounces+kyle=kyleam.com@gnu.org Sender: "gwl-devel" To: gwl-devel@gnu.org While playing with a real-world workflow I found a few problems: * inputs and outputs are not validated When a process declares that it produces an output, but then doesn=E2=80= =99t do that, the next process will fail with a nasty error message. This is especially nasty when using containerization as the error is about failing to map the input into the container. Processes should automatically validate their inputs and outputs. Since inputs and outputs could technically be something other than files I=E2=80=99m not sure exactly how to do this. @Roel: can you give an example of inputs / outputs that are not files? I remember that you suggested that inputs might be database queries, for example. I wonder if we should mark inputs and outputs with types, so that the GWL can know if something is supposed to be a file or something else. =E2=80=A6just how would =E2=80=9Csomething else=E2=80= =9D be used in a process? * The --output option has no effect I think the =E2=80=9C--output=E2=80=9D option should cause all generated = files to end up somewhere in the given directory. I wonder if this should affect *all* generated files or just the final output. If all outputs should be affected then all *inputs* must be adjusted as well. Maybe =E2=80=9C--output=E2=80=9D is the wrong name. Should it be =E2=80=9C--pr= efix=E2=80=9D instead? * It=E2=80=99s not possible to select more than one tagged item In my test workflow I=E2=80=99m generating a bunch of inputs by mapping o= ver an argument list. Now the problem is that I can=E2=80=99t select all of = these inputs easily in a code snippet. With the syntax we have I can only select the first item following a tag. To address this I=E2=80=99ve extended the accessor syntax, so this works = now: --8<---------------cut here---------------start------------->8--- process frobnicate packages "frobnicator" inputs . genome: "hg19.fa" . samples: "a" "b" "c" outputs . "result" # { frobnicate -g {{inputs:genome}} --files {{inputs::samples}} > {{outputs= }} } --8<---------------cut here---------------end--------------->8--- Note how {{inputs::samples}} is substituted with =E2=80=9Ca b c=E2=80=9D.= With just a single colon it would be just =E2=80=9Ca=E2=80=9D. Single colon =3D sing= le item; double colon =3D more than one item. * Containerization and directories Containers for processes that create output files in directories that don=E2=80=99t exist yet cannot be created. That=E2=80=99s because the = =E2=80=9Ccontainerize=E2=80=9D procedure tries to map directories of input and output files into the container =E2=80=94 and the output directory doesn=E2=80=99t exist yet. How should this be handled? We could ignore non-existing output directories when creating containers, I suppose. I think that=E2=80=99s = the best option, because we can=E2=80=99t just create them lest we break procedures that don=E2=80=99t deal well with existing directories. -- Ricardo