From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id +KAxOkDBimG5cgEAgWs5BA (envelope-from ) for ; Tue, 09 Nov 2021 19:43:12 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id iPrONUDBimHeFQAA1q6Kng (envelope-from ) for ; Tue, 09 Nov 2021 18:43:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 671C91FE8E for ; Tue, 9 Nov 2021 19:43:11 +0100 (CET) Received: from localhost ([::1]:47258 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mkW5O-0001vZ-D5 for larch@yhetil.org; Tue, 09 Nov 2021 13:43:10 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56226) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mkVyr-0004hX-I7 for guix-devel@gnu.org; Tue, 09 Nov 2021 13:36:27 -0500 Received: from [2a00:1450:4864:20::431] (port=41487 helo=mail-wr1-x431.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mkVyq-0001Sw-2F; Tue, 09 Nov 2021 13:36:25 -0500 Received: by mail-wr1-x431.google.com with SMTP id d3so34595001wrh.8; Tue, 09 Nov 2021 10:36:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=C1lGAg2kwHu5kImusnjLZtInwiQ1RCrf6WUCIbwAuWA=; b=fbNbifg2msALm/T9CuciM5QGlxewFVsQ5nhYq7OH78NCmXn7qw8MQM/dUSqiNL86EI ATeiiSMg6s0LkyE16oSakTGNTDax5WP7e7dD7eTtQRxSDkF3UY7k5a9R23VS6THCQxHl UEc/XGiy/R+26AihKCXi2Bm+LpBz180XmQQF6vFLBWFFKE7tnYyxeICmg4nC1axL9FRL NA/fnAkwncSGpl6TubdiTIvp7RESCMvKj0BvGfZZVh7sM/URMBXA80U63KJrNnslU5yR TOeC0XZfL1ZlEusRNejYgS2wyF/A8eDpDtM/KmenSGTM3n8TD1L1Dr8EePScYWgpf9Lf 1RaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=C1lGAg2kwHu5kImusnjLZtInwiQ1RCrf6WUCIbwAuWA=; b=5KJvYyR1C9TZ8W/eld3fl2OzSl7gLhbS+T813apLBPXAZc1OZOfzylA+yE4PzmIV/B vSwE7lGCSbTCYtjVElZW7XcAP+oUQFI6YrVfE6zDTCBrhxNpnqv18M6AM3Xr2ltfQZTZ nRZYo9jW8juinbpES3aGMvuNwFIxHJ9R8ZbUGBDtEHyN15LA0zw/7D84VMS29y/XJ2GB OMFn4moFm4BgP2M9rvOmL27b+OCqvacdZY/A3Afvg/MC1J1YERxaza1fjvb/gUhgf4Gw +txotnm2a6UTEqB57Sdjcwb5iH4sNx6rD72O09g8mpEVzB5Jls/VV6bH16vB/DD8RurY mg7Q== X-Gm-Message-State: AOAM530agWhHMmfF7Eex85L+ke90hlt9ub5b7dY054EKOJuSzaw3dVhx ZObReHW+Vr4r9yv6r2WEA7OIYnzIjgM= X-Google-Smtp-Source: ABdhPJxBOW2+lz1H6DToaHvaz3JECuMpQfsTh/IccNcGoP3VQ+5DM7gPY69q6dYQZEP/kvzgBZJUCg== X-Received: by 2002:adf:edce:: with SMTP id v14mr11786158wro.291.1636482981863; Tue, 09 Nov 2021 10:36:21 -0800 (PST) Received: from lili (214.13.23.93.rev.sfr.net. [93.23.13.214]) by smtp.gmail.com with ESMTPSA id p12sm20511185wro.33.2021.11.09.10.36.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Nov 2021 10:36:21 -0800 (PST) From: zimoun To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Accuracy of importers? In-Reply-To: <87o86tjmvh.fsf@gnu.org> References: <878ryd8we4.fsf@inria.fr> <87ilxfwl2q.fsf@gnu.org> <86wnlujyvw.fsf@gmail.com> <87o86tjmvh.fsf@gnu.org> Date: Tue, 09 Nov 2021 19:36:02 +0100 Message-ID: <86ee7pnpm5.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::431 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=zimon.toutoune@gmail.com; helo=mail-wr1-x431.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1636483392; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=C1lGAg2kwHu5kImusnjLZtInwiQ1RCrf6WUCIbwAuWA=; b=DVJgXv3jRyUK5Py6PxNpaLNQ7rXL4gNSQePNTBgLseTW3AwSUduC3IGe25sfJoWbHjfnYW rOrvir2uo9rWVewFcPsrlrC3nrOO8AvwvQKOqku+fn/kbkNjnUSzYwruAPnjYCnT0lXnJ8 pc0C993FdOMnjPR2Y+8fKeCwuw2dYiSTjQtC+tsnAnig4IjppR5EC0iCV730EIg6BC0jDR ILBbDzMSqFUxEAUdYUAXWorJDVIqN6uY1yGtXd3/Rssfwxd2Ztwygbda/9hOta/OJuORu9 hMkD1DAeNtI+TgVziqcp7sIUMUQAnE7uYIK6VAHbWfmu2SDA1qoOxMQn8PL0xw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1636483392; a=rsa-sha256; cv=none; b=Ep0NF0FLoR5Gb+DYD1QXLnoF//LDUMmtQzbFSWgF0kqIHcibXMShLb1WixGK6ES9gi96X2 oxJbOIDqrr68bCIVKI1wlMIO5Wzp9bNGwFlIbjarIKXBJcWXBK8ySOoxkRlNjGdoe5Ulsb QTC/wij74aTWM4A2rTha2M58+j8gcp2uf1Fdhu1n9KAZEwOhwauWkMx4pxgc14wy9dP21T KsIVzzdmFtI1M8zipSNKj9pkKFvrT3gd7XcGmXJQf9m7Sj2ZSzN95pDMDQB6J2WyLRRZWw 69HAd0vXeD5JadZfOmbyyYMQsL9uZexxQn7QOejnEpm2BnxPRtIbdQW1Hqh+GA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=fbNbifg2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -2.12 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=fbNbifg2; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: 671C91FE8E X-Spam-Score: -2.12 X-Migadu-Scanner: scn0.migadu.com X-TUID: l5CqnWcboG1F Hi, On Tue, 09 Nov 2021 at 17:48, Ludovic Court=C3=A8s wrote: > What I think those figures show is the amount of manual tweaks necessary > to get a proper package =E2=80=9C=C3=A0 la Guix=E2=80=9D, with tests runn= ing etc. For PyPI > we often need to add things under =E2=80=98native-inputs=E2=80=99, hence = the 71% > =E2=80=9Cdifferent inputs=E2=80=9D line. For CRAN that=E2=80=99s sometim= es necessary, but much > less frequently. There are also cases with non-R/non-Python > dependencies. The numbers are based on =E2=80=9Cdependencies=E2=80=9C mismatch. But this= mismatch is sometimes artificial. For instance, I am not convinced that upstream distinguish between build-time (or test-time) dependency and run-time dependency. I mean many packages would work with all dependencies directly inside =E2=80=99propagated-inputs=E2=80=99 or =E2=80=99inputs=E2= =80=99 (probably what importers return), when =E2=80=9C=C3=A0 la Guix=E2=80=9D move some to =E2=80=99native= -inputs=E2=80=99. Well, I do not know what we can conclude at the end. For instance, the numbers are: Accuracy for 'pypi' (200 packages): accurate: 58 (29%) different inputs: 142 (71%) different source: 0 (0%) inconclusive: 0 (0%) Accuracy for 'cran' (200 packages): accurate: 176 (88%) different inputs: 23 (12%) different source: 1 (0%) inconclusive: 0 (0%) but on these numbers, how many CRAN packages have other dependencies than the ones listed =E2=80=99propagated-inputs=E2=80=99? I guess 24. My point is that there is a strong bias about the =E2=80=9Ccomplexity=E2=80= =9C of packages. If CRAN packages are =E2=80=9Csimpler=E2=80=9D, then indeed they= are more accurate. Other said, when picking 200 samples for each importer, each of these 200 batch should have the same distribution about inputs: - X =E2=80=99propagated-inputs=E2=80=99 only - Y =E2=80=99propagated-inputs=E2=80=99 and =E2=80=99inputs=E2=80=99 - Z =E2=80=99propagated-inputs=E2=80=99 and =E2=80=99inputs=E2=80=99 and = =E2=80=99native-inputs=E2=80=99 where X+Y+Z=3D100%. Then, the number of the two importers become =E2=80=9Ccomparable=E2=80=9D. =20 >> My understanding of this experiment is about upstream =E2=80=9Cquality= =E2=80=9D, not >> about importer =E2=80=9Caccuracy=E2=80=9D. Do I incorrectly understand? > > Yes, in a way, assuming our importers are not lossy, this tells us > whether the upstream repo contains enough information and/or whether > that information is accurate. Thanks for explaining. Cheers, simon