From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?G=C3=A1bor_Boskovits?= Subject: Re: cannot boot with BTRFS in degraded mode Date: Wed, 4 Sep 2019 22:49:09 +0200 Message-ID: References: <87lfv4163w.fsf@roquette.mug.biscuolo.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:54110) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i5cDU-0001si-9u for help-guix@gnu.org; Wed, 04 Sep 2019 16:49:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i5cDS-0002PV-AW for help-guix@gnu.org; Wed, 04 Sep 2019 16:49:24 -0400 Received: from mail-ed1-x52c.google.com ([2a00:1450:4864:20::52c]:46897) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1i5cDS-0002Ox-0j for help-guix@gnu.org; Wed, 04 Sep 2019 16:49:22 -0400 Received: by mail-ed1-x52c.google.com with SMTP id i8so326533edn.13 for ; Wed, 04 Sep 2019 13:49:21 -0700 (PDT) In-Reply-To: <87lfv4163w.fsf@roquette.mug.biscuolo.net> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-guix-bounces+gcggh-help-guix=m.gmane.org@gnu.org Sender: "Help-Guix" To: Giovanni Biscuolo Cc: help-guix Hello Giovanni, Giovanni Biscuolo ezt =C3=ADrta (id=C5=91pont: 2019. szept. 4= ., Sze, 16:36): > Hi Guix! > > Yesterday I had to physically replace a failed disk on milano-guix-1 > (one of Guix build machines), that disk was part of a BTRFS RAID10 > multi disk array and now the machine is unbootable > Sorry to hear that. > The BTRFS RAID10 array was made of 6 disks and was running well, some > days ago Christopher Baines found that the 5th disk (/dev/sde) of that > array failed and was able to remount it in degraded mode in order to > re-balance the array and go on working without data loss > > Unfortunately I was not able to perform a "btrfs replace..." since > adding a new disk (we have spare slots) was not detected by the > kernel... HP ProLiant Smart Array is not so smart after all (aka bye bye > hot swapping of disks) :-S... > > So I had to reboot the server and enter the config tool, added the new > drive as a new Smart Array logical volume (RAID0 with 1 drive) [1] and > removed the failed logical volume > > The problem now is that the boot process stops when trying to mount the > BTRFS filesystem, the error is: > > --8<---------------cut here---------------start------------->8--- > BTRFS error (device sda3): devid 5 uuid [omissis] is missing > --8<---------------cut here---------------end--------------->8--- > > ([omissis] means I'm not copying the exact uuid, sda3 is the first block > device in the BTRFS pool) > > All I get now is the guix rescue environment prompt, that I do not know > how to use: I'm not able to boot with BRTFS in degraded mode :-S > > Christopher suggested I might be able to at least mount the filesystem > with the degraded option in the guix rescue environment, which might be > something like: > > --8<---------------cut here---------------start------------->8--- > (mkdir "/mnt/broken-root") > (mount "/dev/sda3" "/mnt/broken-root" "btrfs" 0 "degraded") > --8<---------------cut here---------------end--------------->8--- > > but we do not know how to proceed from there. > I don't know what would work from here, but here are a few ideas: 1. somehow hack the degraded root option into the bootloader config, like here: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1229456 2. try to switch_root, using /bin/sh as init, and try to fix the bootloader config from there. 3. see what the original script is doing, either by having a look at how it is composed: see for example: gnu/system.scm: operating-system-default-essential-services, gnu/services.scm: %boot-service and most prominently: gnu/services/shepherd.scm: shepherd-boot-gexp Wdyt? > Obviously I have no way now to reconfigure guix, the only idea I got is > to boot from an USB rescue disk (e.g. grml) and try to do a "btrfs > replace..." from there: that should fix the BTRFS array and should allow > a mount in non-degraded mode, so the next Guix boot should succeed > > That machine is physically far away from me and I should collect as much > info as possible before I go there to test for a solution (no remote > serial console unfortunately) > > I'm searching the web for a solution, any hint will be greatly > appreciated :-) > > Meanwhile milano-guix-1 build machine is offline... :-( > > Thank you for your attention, Gio' > > > > > [1] AFAIU that is the only way to present a single disk to the OS and > let the OS manage it as part of a **software** RAID pool (hardware RAID > is not an option) > > -- > Giovanni Biscuolo > > Xelera IT Infrastructures > Best regards, g_bor --=20 OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21