From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guix-devel-bounces+larch=yhetil.org@gnu.org>
Received: from mp12.migadu.com ([2001:41d0:8:6d80::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms9.migadu.com with LMTPS
	id WPK7CFRAMWTKKAEASxT56A
	(envelope-from <guix-devel-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Sat, 08 Apr 2023 12:22:12 +0200
Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp12.migadu.com with LMTPS
	id IBuUCFRAMWSLogAAauVa8A
	(envelope-from <guix-devel-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Sat, 08 Apr 2023 12:22:12 +0200
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 6988E1F34C
	for <larch@yhetil.org>; Sat,  8 Apr 2023 12:22:11 +0200 (CEST)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guix-devel-bounces@gnu.org>)
	id 1pl5ha-0008Jc-85; Sat, 08 Apr 2023 06:21:46 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ncdehnel@gmail.com>)
 id 1pl5hY-0008JU-EQ
 for guix-devel@gnu.org; Sat, 08 Apr 2023 06:21:44 -0400
Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <ncdehnel@gmail.com>)
 id 1pl5hW-00064a-5B
 for guix-devel@gnu.org; Sat, 08 Apr 2023 06:21:44 -0400
Received: by mail-ot1-x32c.google.com with SMTP id
 r17-20020a05683002f100b006a131458abfso20462238ote.2
 for <guix-devel@gnu.org>; Sat, 08 Apr 2023 03:21:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20210112; t=1680949299; x=1683541299;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=;
 b=fiGF/EJ9NDuvydagFY49aS1V7bNP9dn3SKp6EKrvlmblK7tPpMri5WDEF/KdzDHFDy
 83ex8Q/w7XvNvYNdlgz7TPaEuiqLUz/IAKshzcW2Qsu5GPcgsG3UrgZeVJi7qgjqzXfg
 0r5mgDEFX6le8l2L+hZqJCSXe/1TlGQ0R+eeBtnw/yg4UnpDHlb9TOQUNPC+tzju/Pvb
 rQ5a7/5exBaQwkTf/Ps+EbY+RknqxnpxqxCUYVT+X9tys2Q0Oc5VOskYHfI5kFpAyTmX
 YnTHdOuWq4yNg5lI3APTg3UunXazKbIoNemp7N2pAkUxDt68bbtxUoyqtcylLw8E95WD
 y1LQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1680949299; x=1683541299;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=;
 b=5Hsu1hP20lsNdniPMJkToRn06qtusNqr+uSzS64GuxpH5HoMGDp09fngxhSiwmA7Wt
 OYL0Zta0I+h9cJms+JuCYFW5dL7hz7nJ0NVmQ+VndEkb4RiEcnTYcMlT2BmVWzCYhZPY
 JbuOTg3aRfUclWbhs/tA639EGWl4g2FhUW7MI+t5Dh2kGx6nXalmfhv+u/InxkUkmNYq
 SEjiXn4vLS+FsBScU6Tq1WgddG58jMyzZ8OtCLtEahsZeTMn7cmNJzykclKFKZsYPiqr
 JoQrMSN7YILz1kTHn5pVVBOuA6XTANIvYLLDi/872YQRBIkvR2VrCP85lHZmw42bhH/z
 E0xA==
X-Gm-Message-State: AAQBX9fOO19BeDwA0RDK2BRBklqfGBLOUeAQgOhvcF8N0HlvpjVkFE0a
 dOQUcybuAVbnMPbbcOTJHAt1cvdb0S6SLnKvb4s=
X-Google-Smtp-Source: AKy350bdoloHqD9AIeHDITYxfwAfaocKVP576qRSOkZbK+Bh+p9EFODO12IEU5Cm9QgsdKTGNW315Q90x7YZAgVO2Kc=
X-Received: by 2002:a9d:6c55:0:b0:690:eb8c:bae0 with SMTP id
 g21-20020a9d6c55000000b00690eb8cbae0mr342704otq.6.1680949299017; Sat, 08 Apr
 2023 03:21:39 -0700 (PDT)
MIME-Version: 1.0
References: <CAEEhgEuT+YnGZMFB=v=zM56RfOULbXdxt4mHBXp8_X+eJM6Htg@mail.gmail.com>
 <87sfdckp05.fsf@gmail.com>
In-Reply-To: <87sfdckp05.fsf@gmail.com>
From: Nathan Dehnel <ncdehnel@gmail.com>
Date: Sat, 8 Apr 2023 05:21:27 -0500
Message-ID: <CAEEhgEtBDE5XxHSgWitOWbhFTu4Q=bv=0gMQud6eNXBQ3CEBeA@mail.gmail.com>
Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re:
 Where should we put machine learning model parameters?)
To: Simon Tournier <zimon.toutoune@gmail.com>
Cc: rprior@protonmail.com, guix-devel@gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=2607:f8b0:4864:20::32c;
 envelope-from=ncdehnel@gmail.com; helo=mail-ot1-x32c.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guix-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Development of GNU Guix and the GNU System distribution."
 <guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
 <mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guix-devel>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
 <mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org
Sender: guix-devel-bounces+larch=yhetil.org@gnu.org
X-Migadu-Country: US
X-Migadu-Flow: FLOW_IN
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680949331; a=rsa-sha256; cv=none;
	b=Of0zdbcKAobJWEeRYKNS6BfXxZ3lrdEpx5q/7c3OASo8cd63RRoAvV/Lv8BAzZK3PSNVLL
	d4FeXZ7sJasZHmCq3o2SzHx5/aXPo/7BXLIu64AaF5u0nKTWiz6MyQa9kQcW/3iIhLKLm/
	/8yRld6gG5r3lwwCLKB3gWFlbtt6wJad/q0ougSPVPo/lbrlE/NQwA7P2vBYeeUPotzLGG
	fc7reFokgipKjOXWs0X2rf5AeSOAyoiC0W1hJrq3ABpZkZB0fOtDgEZE5Eob5JLM5aF+F+
	A5wD3zE6hXmiD5qeHec/rux5BIGak1mLzSFP/zP1jjD3hOPWlUElL1gVv5DwTQ==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b="fiGF/EJ9";
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1680949331;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:list-id:list-help:
	 list-unsubscribe:list-subscribe:list-post:dkim-signature;
	bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=;
	b=mtz7f5l9JEwQ2pImVBXFuyC0pq8C+3DvyLHa+c+K7AWWE3X/aBpfe0V83xehJuoC3Fzi8J
	gt4CS+HtAEz4sW8gMIeLsru48krHnnJLu/yqDNvMv/UzLa+BqRjUKv4jQEk8UbsxRR0mq6
	2nUJtWGPd7zxhrqqsZhwPQbEBbkUXqHnM7M9H1caIx53fppZCfeL3VCpCvQk+u88bCc6eg
	sLH2CIzvyIO1Q1rPvx1KkLSxg0NJ+hOHsbTbzwMpY3xNjbB3dbaqHH5i3OXqg4sKwabJuh
	CT/bBvsVdm9imTHtiiZBn6jV0UrDiCq/oAV9FvD/rdqDxveaqx0yF/eKZGFIDA==
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b="fiGF/EJ9";
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"
X-Migadu-Scanner: scn0.migadu.com
X-Migadu-Spam-Score: -0.75
X-Spam-Score: -0.75
X-Migadu-Queue-Id: 6988E1F34C
X-TUID: lfesvIK4nT7d

>>From my point of view, the tackle of such biased weights is not via
re-learning because how to draw the line between biased weights,
mistakes on their side, mistakes on our side, etc. and it requires a
high level of expertise to complete a full re-learning.
This strikes me as similar to being in the 80s, when Stallman was
writing the GPL, years before Nix was invented, and saying "the
solution to backdoors in executables is not access to source code due
to the difficulty of compiling from scratch for the average user and
due to the difficulty of making bit-reproducible binaries." Like, bit
reproducibility WAS possible, it was just difficult, so practically
speaking users had to use distro binaries they couldn't fully trust.
So some of the benefits of the source code being available were rather
theoretical for a while. So this argument strikes me as pre-emptively
compromising one's principles based on the presumption that a new
technology will never come along that allows one to practically
exploit the benefits of said principles.

>Instead, it
should come from the ML community that should standardize formal methods
for verifying that the training had not been biased, IMHO.
What "formal methods" for that are known? As per the article, the
hiding of the backdoor in the "whitebox" scenario is cryptographically
secure in the specific case, with that same possibility open for the
general case.

On Fri, Apr 7, 2023 at 5:53=E2=80=AFAM Simon Tournier <zimon.toutoune@gmail=
.com> wrote:
>
> Hi,
>
> On ven., 07 avril 2023 at 00:50, Nathan Dehnel <ncdehnel@gmail.com> wrote=
:
>
> > I am uncomfortable with including ML models without their training
> > data available. It is possible to hide backdoors in them.
> > https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisibl=
e-backdoors-in-ai-20230302/
>
> Thanks for pointing this article!  And some non-mathematical part of the
> original article [1] are also worth to give a look. :-)
>
> First please note that we are somehow in the case =E2=80=9CThe Open Box=
=E2=80=9D, IMHO:
>
>         But what if a company knows exactly what kind of model it wants,
>         and simply lacks the computational resources to train it? Such a
>         company would specify what network architecture and training
>         procedure to use, and it would examine the trained model
>         closely.
>
> And yeah there is nothing new ;-) when one says that the result could be
> biased by the person that produced the data.  Yeah, we have to trust the
> trainer as we are trusting the people who generated =E2=80=9Cbiased=E2=80=
=9D (*) genomic
> references.
>
> Well, it is very interesting =E2=80=93 and scary =E2=80=93 to see how to =
theoretically
> exploit =E2=80=9Cmisclassify adversarial examples=E2=80=9C as described e=
.g. by [2].
>
> This raises questions about =E2=80=9CVerifiable Delegation of Learning=E2=
=80=9D.
>
> From my point of view, the tackle of such biased weights is not via
> re-learning because how to draw the line between biased weights,
> mistakes on their side, mistakes on our side, etc. and it requires a
> high level of expertise to complete a full re-learning.  Instead, it
> should come from the ML community that should standardize formal methods
> for verifying that the training had not been biased, IMHO.
>
> 2: https://arxiv.org/abs/1412.6572
>
> (*) biased genomic references, for one example among many others:
>
>         Relatedly, reports have persisted of major artifacts that arise
>         when identifying variants relative to GRCh38, such as an
>         apparent imbalance between insertions and deletions (indels)
>         arising from systematic mis-assemblies in GRCh38
>         [15=E2=80=9317]. Overall, these errors and omissions in GRCh38 in=
troduce
>         biases in genomic analyses, particularly in centromeres,
>         satellites, and other complex regions.
>
>         https://doi.org/10.1101/2021.07.12.452063
>
>
> Cheers,
> simon