From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id uMduIFD2L2TGGAAASxT56A (envelope-from ) for ; Fri, 07 Apr 2023 12:54:08 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id QBQnIFD2L2TT9AAAauVa8A (envelope-from ) for ; Fri, 07 Apr 2023 12:54:08 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4CE62318EF for ; Fri, 7 Apr 2023 12:54:08 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pkjid-0000tE-CR; Fri, 07 Apr 2023 06:53:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pkjib-0000sc-Di for guix-devel@gnu.org; Fri, 07 Apr 2023 06:53:21 -0400 Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pkjiY-0008KF-MM for guix-devel@gnu.org; Fri, 07 Apr 2023 06:53:21 -0400 Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-2ef67bbb136so129335f8f.1 for ; Fri, 07 Apr 2023 03:53:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680864795; x=1683456795; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=SbcSOay6GAuh8HmN3bjKHWz/UavgpdnVS7G5dH2IDt0=; b=M0k7e0hMTNuwdC4ZGhruQ8GOljobiwiBiXEKERCdW1kcPlWyz0KIX5RAPfVK0w647u /uV3B1wzNoJRFZF1MHb483o/mRlVgyeMLBMj3lAEOKPHKmpR1wp7Sd1HWkdszmMVVT/U FZRdd9On00LJ0NG8xpbRGuVhc/ivbL4x2Z80oC0RRq8/4hPPXi3eXm70EfQB6ac5JEtI tqnHunSADm7yOwcuodlEUfhYQqSEoPaIhpIMsLOE6pWWlQSfAQtYYVwVEQHmzs0YZDKJ xovGRBbq7f17GkmXsPgBOzxMM4yu/Q5iSjOXhbAfp3SUPVR5V4pFrJMbOgTUNsqFhIdm CgaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680864795; x=1683456795; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=SbcSOay6GAuh8HmN3bjKHWz/UavgpdnVS7G5dH2IDt0=; b=05Ac0TpxkAMvpbmpsZkRz9POo/Ly/QM0cp745Z7Yn1pW3+CmJFfOqel02yWdEWXtPd EVMTOcyPtcvHXvNhUi4R7wqevXWdjlDwUdzhh2d4imgLraz74VWoYSNWP4IOeKBHxODI 9iRVtvwz/NAz3FIEvGDgCwENtHm5YfENJNNneNxQJ3iSH5e8MABso4N9kgHnXaTTb7n0 i+4DdYjQo4h7JvrVG+ljbKkW+oBea5F4lmL1CMVR7+8f5o06lUrs7sdst2Q9Vau13J6X uvirRYuk3xeZ4Vt4tVqHRv1IUmUZFuiTDZU0dO+ej7302dV+qthQi4Ei64/PH9JMJnqQ rUig== X-Gm-Message-State: AAQBX9eUmDdfKF7/7hOlhDFHbRzv9Izf0WvXNAo4OuBmBlpc2hbpk05I 9x4+fjbnrOo0S0odYlymuYSRWdBheeU= X-Google-Smtp-Source: AKy350YqayMOvsGZ2Bhv5xWv0+A3GNE5zPSFdMYWWqwcSRT+sntt5z1NAt9mrepcVsBZxk8n+SfJKA== X-Received: by 2002:a5d:42cf:0:b0:2ee:e42e:e8af with SMTP id t15-20020a5d42cf000000b002eee42ee8afmr673115wrr.7.1680864795615; Fri, 07 Apr 2023 03:53:15 -0700 (PDT) Received: from pfiuh07 ([193.48.40.241]) by smtp.gmail.com with ESMTPSA id n5-20020a5d4845000000b002e5f6f8fc4fsm4218695wrs.100.2023.04.07.03.53.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Apr 2023 03:53:15 -0700 (PDT) From: Simon Tournier To: Nathan Dehnel , rprior@protonmail.com, guix-devel@gnu.org Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?) In-Reply-To: References: Date: Fri, 07 Apr 2023 11:42:02 +0200 Message-ID: <87sfdckp05.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=zimon.toutoune@gmail.com; helo=mail-wr1-x431.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680864848; a=rsa-sha256; cv=none; b=WhUNhLgxqv1UJfxwa3Ugmfxiy2LcER0v6/Xp7DrFhlTwDJIeLZKc3plzNwtko+ErBi1ZLM uIouQUbyVW4KD3Mw7WAcxnHC8/ZSBa1wNc77IJx8fpHxbU9W8fxj0cVBdBfcYSqXjdwiMu DjR/5qKk9yOhq2Y38Gmt/nT1wkbfBkqIKemq/2gDpsTPWb737PoTMj3WBBqa7pZCFm0Vup lyDeF601ijFheawTu91RgIKxw3M59h588gHa2EAC0/HQVEyGDDtpcvWw0c19fB7POi5Mtm hX1dxiOFURImnnxd3a4vs8OtyTVkQH0iFWz/8tsKDCNX7jiPlH899mOe3paz8w== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M0k7e0hM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680864848; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=SbcSOay6GAuh8HmN3bjKHWz/UavgpdnVS7G5dH2IDt0=; b=AstGbtbpaTOHyy+RSWEuD31w9/50GR4KnwQwB7pQGk7yvewSHTF0gZK+YjKwQAm5rXdo6c M3rX10yTFQToTj0W5sn41NWEe6w9KH2ctHno774LjXkWrduA5FnUd4Qg6wXrPMXy86m0nc OvW3TPbN13UAeSmZ0y8okFwxpXo6Ze4eRNt2CL7ZcbWy1bwIbW/AajN+U0O02EfLNlwYHK sP67n8b1JNFzco1aUCDovNcglq685RyHuf0NhyKBYlyTyxoi0v9OpqdAM6Lv+PJLN3ulNu 8wvEdvmIvNRM5KKrZPRxwWPE/YHYxaKYefCbuw/NmtggsYP+ULM/A/aOpSKbAg== X-Migadu-Spam-Score: 2.05 X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=M0k7e0hM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Spam-Score: 2.05 X-Migadu-Queue-Id: 4CE62318EF X-TUID: WL3eeYq0b4aF Hi, On ven., 07 avril 2023 at 00:50, Nathan Dehnel wrote: > I am uncomfortable with including ML models without their training > data available. It is possible to hide backdoors in them. > https://www.quantamagazine.org/cryptographers-show-how-to-hide-invisible-= backdoors-in-ai-20230302/ Thanks for pointing this article! And some non-mathematical part of the original article [1] are also worth to give a look. :-) First please note that we are somehow in the case =E2=80=9CThe Open Box=E2= =80=9D, IMHO: But what if a company knows exactly what kind of model it wants, and simply lacks the computational resources to train it? Such a company would specify what network architecture and training procedure to use, and it would examine the trained model closely. And yeah there is nothing new ;-) when one says that the result could be biased by the person that produced the data. Yeah, we have to trust the trainer as we are trusting the people who generated =E2=80=9Cbiased=E2=80= =9D (*) genomic references. Well, it is very interesting =E2=80=93 and scary =E2=80=93 to see how to th= eoretically exploit =E2=80=9Cmisclassify adversarial examples=E2=80=9C as described e.g= . by [2]. This raises questions about =E2=80=9CVerifiable Delegation of Learning=E2= =80=9D. >From my point of view, the tackle of such biased weights is not via re-learning because how to draw the line between biased weights, mistakes on their side, mistakes on our side, etc. and it requires a high level of expertise to complete a full re-learning. Instead, it should come from the ML community that should standardize formal methods for verifying that the training had not been biased, IMHO. 2: https://arxiv.org/abs/1412.6572 (*) biased genomic references, for one example among many others: Relatedly, reports have persisted of major artifacts that arise when identifying variants relative to GRCh38, such as an apparent imbalance between insertions and deletions (indels) arising from systematic mis-assemblies in GRCh38 [15=E2=80=9317]. Overall, these errors and omissions in GRCh38 intr= oduce biases in genomic analyses, particularly in centromeres, satellites, and other complex regions. https://doi.org/10.1101/2021.07.12.452063 Cheers, simon