From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id WPK7CFRAMWTKKAEASxT56A (envelope-from ) for ; Sat, 08 Apr 2023 12:22:12 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id IBuUCFRAMWSLogAAauVa8A (envelope-from ) for ; Sat, 08 Apr 2023 12:22:12 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6988E1F34C for ; Sat, 8 Apr 2023 12:22:11 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pl5ha-0008Jc-85; Sat, 08 Apr 2023 06:21:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pl5hY-0008JU-EQ for guix-devel@gnu.org; Sat, 08 Apr 2023 06:21:44 -0400 Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pl5hW-00064a-5B for guix-devel@gnu.org; Sat, 08 Apr 2023 06:21:44 -0400 Received: by mail-ot1-x32c.google.com with SMTP id r17-20020a05683002f100b006a131458abfso20462238ote.2 for ; Sat, 08 Apr 2023 03:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680949299; x=1683541299; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=; b=fiGF/EJ9NDuvydagFY49aS1V7bNP9dn3SKp6EKrvlmblK7tPpMri5WDEF/KdzDHFDy 83ex8Q/w7XvNvYNdlgz7TPaEuiqLUz/IAKshzcW2Qsu5GPcgsG3UrgZeVJi7qgjqzXfg 0r5mgDEFX6le8l2L+hZqJCSXe/1TlGQ0R+eeBtnw/yg4UnpDHlb9TOQUNPC+tzju/Pvb rQ5a7/5exBaQwkTf/Ps+EbY+RknqxnpxqxCUYVT+X9tys2Q0Oc5VOskYHfI5kFpAyTmX YnTHdOuWq4yNg5lI3APTg3UunXazKbIoNemp7N2pAkUxDt68bbtxUoyqtcylLw8E95WD y1LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680949299; x=1683541299; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=; b=5Hsu1hP20lsNdniPMJkToRn06qtusNqr+uSzS64GuxpH5HoMGDp09fngxhSiwmA7Wt OYL0Zta0I+h9cJms+JuCYFW5dL7hz7nJ0NVmQ+VndEkb4RiEcnTYcMlT2BmVWzCYhZPY JbuOTg3aRfUclWbhs/tA639EGWl4g2FhUW7MI+t5Dh2kGx6nXalmfhv+u/InxkUkmNYq SEjiXn4vLS+FsBScU6Tq1WgddG58jMyzZ8OtCLtEahsZeTMn7cmNJzykclKFKZsYPiqr JoQrMSN7YILz1kTHn5pVVBOuA6XTANIvYLLDi/872YQRBIkvR2VrCP85lHZmw42bhH/z E0xA== X-Gm-Message-State: AAQBX9fOO19BeDwA0RDK2BRBklqfGBLOUeAQgOhvcF8N0HlvpjVkFE0a dOQUcybuAVbnMPbbcOTJHAt1cvdb0S6SLnKvb4s= X-Google-Smtp-Source: AKy350bdoloHqD9AIeHDITYxfwAfaocKVP576qRSOkZbK+Bh+p9EFODO12IEU5Cm9QgsdKTGNW315Q90x7YZAgVO2Kc= X-Received: by 2002:a9d:6c55:0:b0:690:eb8c:bae0 with SMTP id g21-20020a9d6c55000000b00690eb8cbae0mr342704otq.6.1680949299017; Sat, 08 Apr 2023 03:21:39 -0700 (PDT) MIME-Version: 1.0 References: <87sfdckp05.fsf@gmail.com> In-Reply-To: <87sfdckp05.fsf@gmail.com> From: Nathan Dehnel Date: Sat, 8 Apr 2023 05:21:27 -0500 Message-ID: Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) To: Simon Tournier Cc: rprior@protonmail.com, guix-devel@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::32c; envelope-from=ncdehnel@gmail.com; helo=mail-ot1-x32c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680949331; a=rsa-sha256; cv=none; b=Of0zdbcKAobJWEeRYKNS6BfXxZ3lrdEpx5q/7c3OASo8cd63RRoAvV/Lv8BAzZK3PSNVLL d4FeXZ7sJasZHmCq3o2SzHx5/aXPo/7BXLIu64AaF5u0nKTWiz6MyQa9kQcW/3iIhLKLm/ /8yRld6gG5r3lwwCLKB3gWFlbtt6wJad/q0ougSPVPo/lbrlE/NQwA7P2vBYeeUPotzLGG fc7reFokgipKjOXWs0X2rf5AeSOAyoiC0W1hJrq3ABpZkZB0fOtDgEZE5Eob5JLM5aF+F+ A5wD3zE6hXmiD5qeHec/rux5BIGak1mLzSFP/zP1jjD3hOPWlUElL1gVv5DwTQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="fiGF/EJ9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680949331; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=n869cmNWsXTQGWQPAPLjE7v9U3OIdZ6ELO4XsBnKJWA=; b=mtz7f5l9JEwQ2pImVBXFuyC0pq8C+3DvyLHa+c+K7AWWE3X/aBpfe0V83xehJuoC3Fzi8J gt4CS+HtAEz4sW8gMIeLsru48krHnnJLu/yqDNvMv/UzLa+BqRjUKv4jQEk8UbsxRR0mq6 2nUJtWGPd7zxhrqqsZhwPQbEBbkUXqHnM7M9H1caIx53fppZCfeL3VCpCvQk+u88bCc6eg sLH2CIzvyIO1Q1rPvx1KkLSxg0NJ+hOHsbTbzwMpY3xNjbB3dbaqHH5i3OXqg4sKwabJuh CT/bBvsVdm9imTHtiiZBn6jV0UrDiCq/oAV9FvD/rdqDxveaqx0yF/eKZGFIDA== Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="fiGF/EJ9"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Scanner: scn0.migadu.com X-Migadu-Spam-Score: -0.75 X-Spam-Score: -0.75 X-Migadu-Queue-Id: 6988E1F34C X-TUID: lfesvIK4nT7d >>From my point of view, the tackle of such biased weights is not via re-learning because how to draw the line between biased weights, mistakes on their side, mistakes on our side, etc. and it requires a high level of expertise to complete a full re-learning. This strikes me as similar to being in the 80s, when Stallman was writing the GPL, years before Nix was invented, and saying "the solution to backdoors in executables is not access to source code due to the difficulty of compiling from scratch for the average user and due to the difficulty of making bit-reproducible binaries." Like, bit reproducibility WAS possible, it was just difficult, so practically speaking users had to use distro binaries they couldn't fully trust. So some of the benefits of the source code being available were rather theoretical for a while. So this argument strikes me as pre-emptively compromising one's principles based on the presumption that a new technology will never come along that allows one to practically exploit the benefits of said principles. >Instead, it should come from the ML community that should standardize formal methods for verifying that the training had not been biased, IMHO. What "formal methods" for that are known? As per the article, the hiding of the backdoor in the "whitebox" scenario is cryptographically secure in the specific case, with that same possibility open for the general case. On Fri, Apr 7, 2023 at 5:53=E2=80=AFAM Simon Tournier wrote: > > Hi, > > On ven., 07 avril 2023 at 00:50, Nathan Dehnel wrote= : > > > I am uncomfortable with including ML models without their training > > data available. It is possible to hide backdoors in them. > > https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisibl= e-backdoors-in-ai-20230302/ > > Thanks for pointing this article! And some non-mathematical part of the > original article [1] are also worth to give a look. :-) > > First please note that we are somehow in the case =E2=80=9CThe Open Box= =E2=80=9D, IMHO: > > But what if a company knows exactly what kind of model it wants, > and simply lacks the computational resources to train it? Such a > company would specify what network architecture and training > procedure to use, and it would examine the trained model > closely. > > And yeah there is nothing new ;-) when one says that the result could be > biased by the person that produced the data. Yeah, we have to trust the > trainer as we are trusting the people who generated =E2=80=9Cbiased=E2=80= =9D (*) genomic > references. > > Well, it is very interesting =E2=80=93 and scary =E2=80=93 to see how to = theoretically > exploit =E2=80=9Cmisclassify adversarial examples=E2=80=9C as described e= .g. by [2]. > > This raises questions about =E2=80=9CVerifiable Delegation of Learning=E2= =80=9D. > > From my point of view, the tackle of such biased weights is not via > re-learning because how to draw the line between biased weights, > mistakes on their side, mistakes on our side, etc. and it requires a > high level of expertise to complete a full re-learning. Instead, it > should come from the ML community that should standardize formal methods > for verifying that the training had not been biased, IMHO. > > 2: https://arxiv.org/abs/1412.6572 > > (*) biased genomic references, for one example among many others: > > Relatedly, reports have persisted of major artifacts that arise > when identifying variants relative to GRCh38, such as an > apparent imbalance between insertions and deletions (indels) > arising from systematic mis-assemblies in GRCh38 > [15=E2=80=9317]. Overall, these errors and omissions in GRCh38 in= troduce > biases in genomic analyses, particularly in centromeres, > satellites, and other complex regions. > > https://doi.org/10.1101/2021.07.12.452063 > > > Cheers, > simon