From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id yItdMF1SAGaAiwAA62LTzQ:P1 (envelope-from ) for ; Sun, 24 Mar 2024 17:18:37 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id yItdMF1SAGaAiwAA62LTzQ (envelope-from ) for ; Sun, 24 Mar 2024 17:18:37 +0100 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=protonmail.com header.s=protonmail3 header.b=yF7iLII6; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1711297117; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=uhZQIrkw1ELbCqAv9k8jDPxFsE0yWDDrLis5OpIX6f4=; b=ple4cW2ZRe8otxR1h3TfNKeY2k34yL9juQT8qff5DoruwXw99cyDCmC6A5bigrxjAbz3++ pOJ8JE6m5KTdMfFr+5O/Vck8hiH7gFC3J+Yv36sjfhmf1kfyBpOzgvSnpgdF32YWMgatdy 6u4POXoozIYR6h4ngrTHVx2SkolHFGsYwMS3Oe5U/4pq9jAL1SxdlLx9nI/bjcAtPvbQDm DQQoj6mDdwKb2NtQox7RIboE4f4k2i7PXprRG+1S2DZsaf1z+WRt//2GI/hzvKtMJz6pCT NYHL3Jq7ATaW9W+Rs9gi+2kIQ6iCZdWGInmzxjt3kxbWeEvcG2TGFyjvXJBvKw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=protonmail.com header.s=protonmail3 header.b=yF7iLII6; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Seal: i=1; s=key1; d=yhetil.org; t=1711297117; a=rsa-sha256; cv=none; b=UYVHYoa8YEXzHaXnxx+PtjQXa83n7/IAnorHez7PoKZPcXlv5/reIvXPLsp+Wv4lYfAEG1 zCM0yRJi8Bhn8JEiB+XxIcNC8dNvzbhZfyBWXBE71b06QaztZPWHRjLp4VG8/SiqP4ryzM +1axd8NFY9Fo7gqnEsm9W5Ikb8CqseDanRvBOQfc68nSep6Eq5nMe9LsAAKKlBI4OuBlDs RIBz9hmY6MgeqEfvR+D2cXHQeTo7zCegPH/iJpAQx/hkqb5j2lAUkUe0d+NFU67In/PZxw PVqrWEHJz2LA+ZnShKXPRcVwzoE6M9eO9Ril3SvkxxeMdzxEM7dPo2jCX/AqoA== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6C9736CFBC for ; Sun, 24 Mar 2024 17:18:37 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1roQYB-0006Cs-1y; Sun, 24 Mar 2024 12:18:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roQY8-0006Bk-Sq for bug-guix@gnu.org; Sun, 24 Mar 2024 12:18:20 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1roQY8-0007cT-K9 for bug-guix@gnu.org; Sun, 24 Mar 2024 12:18:20 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1roQYo-0001xi-37 for bug-guix@gnu.org; Sun, 24 Mar 2024 12:19:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#69982: Setting inodes to 0 leads to incorrect output when extracting with GNU cpio Resent-From: Skyler Ferris Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sun, 24 Mar 2024 16:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 69982 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 69982@debbugs.gnu.org X-Debbugs-Original-To: bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.17112971257493 (code B ref -1); Sun, 24 Mar 2024 16:19:01 +0000 Received: (at submit) by debbugs.gnu.org; 24 Mar 2024 16:18:45 +0000 Received: from localhost ([127.0.0.1]:41412 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1roQYT-0001wc-IS for submit@debbugs.gnu.org; Sun, 24 Mar 2024 12:18:45 -0400 Received: from lists.gnu.org ([209.51.188.17]:46202) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1roQYQ-0001wN-2A for submit@debbugs.gnu.org; Sun, 24 Mar 2024 12:18:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roQXj-00067s-Si for bug-guix@gnu.org; Sun, 24 Mar 2024 12:17:55 -0400 Received: from mail-40131.protonmail.ch ([185.70.40.131]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roQXh-0007ae-3p for bug-guix@gnu.org; Sun, 24 Mar 2024 12:17:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1711297055; x=1711556255; bh=uhZQIrkw1ELbCqAv9k8jDPxFsE0yWDDrLis5OpIX6f4=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=yF7iLII6UNO9YRsAnRdLt3swT0gTWfNkVtV3CsWc4EG5GT6lMY81aXSieS6NT4bRT 6qi9mlKIqPmbtimuha1n4Z8i6jqQddXJoFvvQ69B+dfUyGmmE6baH/QnXV6AZhqWy+ +PfAYB7uE5d4H2XkAuFqf13zPppvJEE1YDZImFbNzvxG2S0zHLgEfX08RLEF/f66Eg cD60aEOsL74aDxL0eAPjrbOiVac5IChnnvJVbg6h9rVdMNoVZMH2pb/3ftQuqjv6Fi bUwQ/3Mhfjw/DmEDPsW7CmTPrLVuejwoGPQaEzlZro28+vROsKlFNu3x59CQCNiS0n OL2c88/Nuhwfg== Date: Sun, 24 Mar 2024 16:17:22 +0000 Message-ID: <83c29759-4e54-40ef-a9d3-b27c4774cd02@protonmail.com> Feedback-ID: 40635331:user:proton MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.131; envelope-from=skyvine@protonmail.com; helo=mail-40131.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Skyler Ferris From: Skyler Ferris via Bug reports for GNU Guix Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 6C9736CFBC X-Spam-Score: -7.03 X-Migadu-Spam-Score: -7.03 X-Migadu-Scanner: mx10.migadu.com X-TUID: JZSkbqB6J1Ch Hello, I have encountered a bug that is caused by the interaction of=20 write-cpio-archive from (gnu build linux-initrd) writing all inodes as 0=20 and the way that GNU cpio processes file headers. I observed this bug=20 while creating a custom initramfs where init is based on a bash script=20 used by another distribution (but I will provide a minimal reproducer=20 below). This bug only exhibits itself when there are multiple different=20 hard links present in the input directory. This email will contain a=20 short set of reproduction steps, an explanation of what I understand the=20 cause of the bug to be, some possible paths forward, and a disclaimer=20 about my limitations due to my background. To reproduce this bug, run the following commands: ```shell $ mkdir /tmp/source $ cd /tmp/source $ echo contents1 > file1.txt $ ln file1.txt link1.txt $ echo contents2 > file2.txt $ echo contents3 > file3.txt $ ln file3.txt link3.txt $ guix repl > (use-modules (gnu build linux-initrd)) > ; disable compression so we don't waste time on it while debugging,=20 it does not impact reproduction > (write-cpio-archive "." "../archive.cpio" #:compress? #f) > ,q $ cd .. $ mkdir out $ cd out $ cat ../archive.cpio | cpio -i $ cat * ``` After running the final step you will see that all of file1.txt,=20 link1.txt, file3.txt, and link3.txt have the contents "contents1": the=20 files which should contain "contents3" have been created incorrectly. Now I will list the set of steps the relevant programs performed which=20 caused this error, followed by a more verbose explanation with=20 references to source code: 1. Guix creates the archive with the inode and major & minor device=20 numbers set to 0. Number of hard links is reported accurately. 2. CPIO reads the archive and hard links files when the header indicates=20 that there are multiple links. It uses the inode and major & minor=20 device numbers to find the correct file to hard link to. 3. As file3.txt and link3.txt both have multiple links and share their=20 inode and major & minor device numbers with file1.txt, they are all=20 linked to file1.txt This error occurs when the cpio utility processes files with hard link.=20 In `copyin_regular_file`, there is a code block which only runs if the=20 file has multiple hard links and the newascii (or checksummed new ascii)=20 format is in use (1). Within that code block there is a conditional to=20 check if the file size is 0, with a comment explaining that the newascii=20 format only records the data for the final file pointing to the relevant=20 inode rather than repeating the data each time. The=C2=A0 code in=20 guix/cpio.scm does not actually do this, so this code block never=20 executes. Instead, the other code block runs which simply calls=20 `link_to_maj_min_ino` (and checks for an error code) (2). This uses=20 `find_inode_file` which references a hash table that associates the=20 inode/major device/minor device with a file path, and if it finds a=20 match then it creates a hard link on the target file system. However,=20 Guix's `file->cpio-header*` sets all of the inode and device numbers to=20 0 for reproducibility. This causes cpio to hard link every file with=20 multiple links to the first file that has multiple links. I see 3 possible paths forward to address this issue: 1. Provide spoofed inode numbers, tracking hard link data. In (gnu build=20 linux-initrd), the `write-cpio-archive` procedure sorts the files by=20 name so we can provide inode numbers that increase sequentially.=20 However, in order to make sure that the correct hard links are findable=20 by the cpio utility we would need to track the real inode numbers as=20 well and use the correct pseudonym in each place. This would noticeably=20 increase the complexity of the code. 2. Provide spoofed inode numbers and spoofed hard link data. In order to=20 avoid tracking the real hard link numbers we can just report all files=20 as having only a single link, and still provide sequential inode numbers=20 as above. This will not increase the size of the cpio archives we=20 generate compared to current output because we are storing the data for=20 each link anyway. This will add some complexity to the cpio code, but=20 less than option 1. 3. Don't support inputs with multiple hard links and require callers to=20 work around this issue. This avoids any changes to the cpio code. I am in favor of option 2 because I think it strikes a good balance=20 between keeping the cpio code stable and supporting reasonable use=20 cases. The cpio code is used to build the initramfs in Guix systems so a=20 bug here could make some systems unbootable. Guix does provide=20 transactional rollbacks which is helpful but it is still a frustrating=20 experience to reboot and immediately see a crash; debugging issues in=20 this early environment is significantly more difficult than debugging=20 post-boot issues. Hard links are not common on many systems because they=20 add complexity to filesystem analysis, but Guix makes good use of them=20 to save space in the store, where it is common for many files to share=20 data and creating symlinks would prevent the garbage collector from=20 deleting otherwise unused outputs. The limitations I referred to in the beginning of the email are that I=20 am inexperienced in this domain. I have only recently (over the past=20 month or so) started looking at building a custom initramfs, and I have=20 never worked with CPIO archives before. I think that my analysis makes=20 sense based on the code I have read and the behavior I have observed,=20 but take everything I say with a grain of salt. I would appreciate any thoughts that anyone has on this matter. Regards, Skyler (1)=20 https://git.savannah.gnu.org/cgit/cpio.git/tree/src/copyin.c?id=3D900bab656= ff24db5e3099941fb909c79c07962ed#n400 (2)=20 https://git.savannah.gnu.org/cgit/cpio.git/tree/src/copypass.c?id=3D900bab6= 56ff24db5e3099941fb909c79c07962ed#n341