unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43344: "basic" system tests fail (and all the other ones) on guix master
@ 2020-09-11 17:50 Danny Milosavljevic
  2020-09-14 13:26 ` Mathieu Othacehe
  2020-09-16 13:59 ` Ludovic Courtès
  0 siblings, 2 replies; 17+ messages in thread
From: Danny Milosavljevic @ 2020-09-11 17:50 UTC (permalink / raw)
  To: 43344

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

Hi,

as of guix master commit 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0 all system
tests, including "basic", fail:

$ guix environment --pure guix --ad-hoc git guile-readline guile-json nano guile-zlib guile-lzlib
(env)$ make TESTS=basic check-system
loading '/gnu/store/s3limrgxj4pd6b4psra66phary2nmqx4-linux-vm-loader'...
[    2.776050] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .
Backtrace:
           3 (primitive-load "/gnu/store/liacrhaiab35pcfzdqsa71fvp7f?")
In ./gnu/build/vm.scm:
   470:21  2 (initialize-hard-disk "/dev/vda" #:bootloader-package _ ?)
    300:2  1 (initialize-partition-table "/dev/vda" (#<<partitio???@ build-log 1
?@ build-log 29023 1
?@ build-log 29023 1
>@ build-log 29023 1
 @ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
 @ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
@ build-log 29023 1

In ./guix/build/utils.scm:
    654:6  0 (invoke _ . _)

./guix/build/utils.scm:654:6: In procedure invoke:
ERROR:
  1. &invoke-error:
      program: "parted"
      arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
      exit-status: 1
      term-signal: #f
      stop-signal: #f
[    3.446031] Unregister pv shared memory for cpu 0
[    3.447989] reboot: Restarting system
[    3.449167] reboot: machine restart
Backtrace: 
           1 (primitive-load "/gnu/store/1xp94hddh88szwkbqyb49jirmf5?")
In ./gnu/build/vm.scm:
   198:12  0 (load-in-linux-vm _ #:output _ #:qemu _ #:memory-size _ ?)

./gnu/build/vm.scm:198:12: In procedure load-in-linux-vm:
guest VM code exited with a non-zero status 256
note: keeping build directory `/tmp/guix-build-qemu-image.drv-0'
builder for `/gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv' failed1
build of /gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv failed
View build log at '/var/log/guix/drvs/ks/5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-im.
cannot build derivation `/gnu/store/8vhwchcxlns64nnqr13p0liw745s6pp8-run-vm.sh.t
cannot build derivation `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.drv't
guix build: error: build of `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.d
make: *** [Makefile:6007: check-system] Error 1
dannym@bayfront ~/src/guix-master/guix [env]$

The same happens on my laptop.

./pre-inst-env guix describe
Git checkout:
  repository: /home/dannym/src/guix-master/guix
  branch: master
  commit: 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0

(ssh://dannym@git.sv.gnu.org/srv/git/guix.git)

I've tried to add a call to wipefs before parted--it doesn't help.

Before the problem surfaced I saw a qemu major update.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-11 17:50 bug#43344: "basic" system tests fail (and all the other ones) on guix master Danny Milosavljevic
@ 2020-09-14 13:26 ` Mathieu Othacehe
  2020-09-14 16:31   ` Danny Milosavljevic
  2020-09-16 13:59 ` Ludovic Courtès
  1 sibling, 1 reply; 17+ messages in thread
From: Mathieu Othacehe @ 2020-09-14 13:26 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: 43344


Hello Danny,

>   1. &invoke-error:
>       program: "parted"
>       arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)

So it looks like the parted script failed inside the VM, while running
"initialize-partition-table".

This work fine both on the build farm and on my machine, which makes the
debug harder :(.

Anything special with your hardware? KVM support disabled maybe?

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-14 13:26 ` Mathieu Othacehe
@ 2020-09-14 16:31   ` Danny Milosavljevic
  2020-09-15 22:34     ` Mark H Weaver
  0 siblings, 1 reply; 17+ messages in thread
From: Danny Milosavljevic @ 2020-09-14 16:31 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43344

[-- Attachment #1: Type: text/plain, Size: 392 bytes --]

Hi Mathieu,

On Mon, 14 Sep 2020 15:26:52 +0200
Mathieu Othacehe <othacehe@gnu.org> wrote:

> Anything special with your hardware? KVM support disabled maybe?

The culprit had been the Linux kernel update to 5.8.8.

After downgrading to 5.8.7 it works just fine--no other changes done.

Previously, I had tried also to add wipefs -a before the parted--that
hadn't helped either.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-14 16:31   ` Danny Milosavljevic
@ 2020-09-15 22:34     ` Mark H Weaver
  2020-09-15 23:06       ` Leo Famulari
  0 siblings, 1 reply; 17+ messages in thread
From: Mark H Weaver @ 2020-09-15 22:34 UTC (permalink / raw)
  To: Danny Milosavljevic, Mathieu Othacehe; +Cc: 43344

Hi,

Danny Milosavljevic <dannym@scratchpost.org> writes:

> On Mon, 14 Sep 2020 15:26:52 +0200
> Mathieu Othacehe <othacehe@gnu.org> wrote:
>
>> Anything special with your hardware? KVM support disabled maybe?
>
> The culprit had been the Linux kernel update to 5.8.8.
>
> After downgrading to 5.8.7 it works just fine--no other changes done.
>
> Previously, I had tried also to add wipefs -a before the parted--that
> hadn't helped either.

That's useful information, but we should not stay frozen on the 5.8.7
kernel for much longer.  5.8.8 contains many bug fixes, some of which
might fix potentially exploitable flaws.

It would be useful to know if this problem still occurs with 5.8.9,
which has since come out.  If so, we should do a bisection between 5.8.7
and 5.8.8 to find out which upstream commit introduced the problem.

Would anyone like to investigate this further?

        Mark




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-15 22:34     ` Mark H Weaver
@ 2020-09-15 23:06       ` Leo Famulari
  2020-09-16 14:12         ` Leo Famulari
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Famulari @ 2020-09-15 23:06 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Mathieu Othacehe, 43344

On Tue, Sep 15, 2020 at 06:34:21PM -0400, Mark H Weaver wrote:
> That's useful information, but we should not stay frozen on the 5.8.7
> kernel for much longer.  5.8.8 contains many bug fixes, some of which
> might fix potentially exploitable flaws.
> 
> It would be useful to know if this problem still occurs with 5.8.9,
> which has since come out.  If so, we should do a bisection between 5.8.7
> and 5.8.8 to find out which upstream commit introduced the problem.
> 
> Would anyone like to investigate this further?

I will try to reproduce the bug with 5.8.9 now. I will try the bisection
if time permits.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-11 17:50 bug#43344: "basic" system tests fail (and all the other ones) on guix master Danny Milosavljevic
  2020-09-14 13:26 ` Mathieu Othacehe
@ 2020-09-16 13:59 ` Ludovic Courtès
  1 sibling, 0 replies; 17+ messages in thread
From: Ludovic Courtès @ 2020-09-16 13:59 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: 43344

[-- Attachment #1: Type: text/plain, Size: 3902 bytes --]

Hi,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

> environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
> creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
> Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .

[...]

>   1. &invoke-error:
>       program: "parted"
>       arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
>       exit-status: 1
>       term-signal: #f
>       stop-signal: #f

The code in question in Parted:

--8<---------------cut here---------------start------------->8---
                if (!add_partition (disk, part)) {
                        ok[i - 1] = 0;
                        errnums[i - 1] = errno;
                }

[…]

        char *bad_part_list = NULL;
        /* now warn about any errors */
        for (i = 1; i <= lpn; i++) {
                if (ok[i - 1] || errnums[i - 1] == ENXIO)
                        continue;
                if (bad_part_list == NULL) {
                        bad_part_list = malloc (lpn * 5);
                        if (!bad_part_list)
                                goto cleanup;
                        bad_part_list[0] = 0;
                }
                sprintf (bad_part_list + strlen (bad_part_list), "%d, ", i);
        }
        if (bad_part_list == NULL)
                ret = 1;
        else {
                bad_part_list[strlen (bad_part_list) - 2] = 0;
                if (ped_exception_throw (
                        PED_EXCEPTION_ERROR,
                        PED_EXCEPTION_IGNORE_CANCEL,
                        _("Partition(s) %s on %s have been written, but we have "
                          "been unable to inform the kernel of the change, "
                          "probably because it/they are in use.  As a result, "
                          "the old partition(s) will remain in use.  You "
                          "should reboot now before making further changes."),
                        bad_part_list, disk->dev->path) == PED_EXCEPTION_IGNORE)
                        ret = 1;
                free (bad_part_list);
        }
--8<---------------cut here---------------end--------------->8---

With the patch below, I strace’d ‘parted’, which gives:

--8<---------------cut here---------------start------------->8---
$ make check-system TESTS=basic

[…]

ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=253, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=254, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=255, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=256, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
write(2, "Error", 5Error)                    = 5
write(2, ": ", 2: )                       = 2
write(2, "Partition(s) 1, 2, 3, 4, 5, 6, 7"..., 495Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/vda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.) = 495
write(2, "\n", 1
)                       = 1
--8<---------------cut here---------------end--------------->8---

So I threw more virtual RAM at it:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 416 bytes --]

diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
@@ -446,6 +450,7 @@ system that is passed to 'populate-root-file-system'."
                                      #:bootloader-installer
                                      #+(bootloader-installer bootloader)))))))
    #:system system
+   #:memory-size 1024
    #:make-disk-image? #t
    #:disk-image-size disk-image-size
    #:disk-image-format disk-image-format

[-- Attachment #3: Type: text/plain, Size: 65 bytes --]


… but that doesn’t help.

Ideas?

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Type: text/x-patch, Size: 2163 bytes --]

diff --git a/gnu/build/vm.scm b/gnu/build/vm.scm
index 287d099f79..e793b5b518 100644
--- a/gnu/build/vm.scm
+++ b/gnu/build/vm.scm
@@ -297,7 +297,7 @@ actual /dev name based on DEVICE."
                                      partition-size)
                             partitions)
                        ", "))
-  (apply invoke "parted" "--script"
+  (apply invoke "strace" "parted" "--script"
          device "mklabel" label-type
          (options partitions offset))
 
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 59ffb334e0..72fb3ca49d 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -349,15 +349,15 @@ corresponding UPSTREAM-SOURCE (an origin), using the given DEBLOB-SCRIPTS."
 
 ;; The current "stable" kernel. That is, the most recently released major
 ;; version.
-(define-public linux-libre-5.8-version "5.8.7")
+(define-public linux-libre-5.8-version "5.8.8")
 (define deblob-scripts-5.8
   (linux-libre-deblob-scripts
-   linux-libre-5.8-version
+   "5.8.7"
    (base32 "07z7sglyrfh0706icqqf3shadf638pvyid9386r661ds5lbsa2mw")
    (base32 "0j6jba5fcddqlb42f95gjl78jisfla4nswqila074gglcrbnl9q7")))
 (define-public linux-libre-5.8-pristine-source
   (let ((version linux-libre-5.8-version)
-        (hash (base32 "1zhpzlhl2ykna2nc70m72wlgyv1pkvkpfssb4k8p5pwlkh1ga2vv")))
+        (hash (base32 "0xm901zvvrwsb9k88la6pb65nybi43bygiyz1z68njwsx6ripxik")))
    (make-linux-libre-source version
                             (%upstream-linux-source version hash)
                             deblob-scripts-5.8)))
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index 80a8618729..49489b6159 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -376,6 +376,10 @@ system that is passed to 'populate-root-file-system'."
 
              (set-path-environment-variable "PATH" '("bin" "sbin") inputs)
 
+             (setenv "PATH"
+                     (string-append #+(file-append strace "/bin") ":"
+                                    (getenv "PATH")))
+
              (let* ((graphs     '#$(match inputs
                                      (((names . _) ...)
                                       names)))

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-15 23:06       ` Leo Famulari
@ 2020-09-16 14:12         ` Leo Famulari
  2020-09-16 14:52           ` Danny Milosavljevic
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Famulari @ 2020-09-16 14:12 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Mathieu Othacehe, 43344

On Tue, Sep 15, 2020 at 07:06:28PM -0400, Leo Famulari wrote:
> I will try to reproduce the bug with 5.8.9 now. I will try the bisection
> if time permits.

It also fails with 5.8.9.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-16 14:12         ` Leo Famulari
@ 2020-09-16 14:52           ` Danny Milosavljevic
  2020-09-16 15:02             ` Danny Milosavljevic
  2020-09-16 16:22             ` Leo Famulari
  0 siblings, 2 replies; 17+ messages in thread
From: Danny Milosavljevic @ 2020-09-16 14:52 UTC (permalink / raw)
  To: Leo Famulari; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

commit 692d0626557451c4b557397f20b7394b612d0289
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Sep 1 11:59:41 2020 +0200

    block: fix locking in bdev_del_partition
    
    [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
    
    We need to hold the whole device bd_mutex to protect against
    other thread concurrently deleting out partition before we get
    to it, and thus causing a use after free.
    
    Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
    Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-16 14:52           ` Danny Milosavljevic
@ 2020-09-16 15:02             ` Danny Milosavljevic
  2020-09-16 16:22             ` Leo Famulari
  1 sibling, 0 replies; 17+ messages in thread
From: Danny Milosavljevic @ 2020-09-16 15:02 UTC (permalink / raw)
  To: Leo Famulari; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

On Wed, 16 Sep 2020 16:52:45 +0200
Danny Milosavljevic <dannym@scratchpost.org> wrote:

> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Sep 1 11:59:41 2020 +0200
> 
>     block: fix locking in bdev_del_partition
>     
>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>     
>     We need to hold the whole device bd_mutex to protect against
>     other thread concurrently deleting out partition before we get
>     to it, and thus causing a use after free.
>     
>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>     Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 

int bdev_del_partition(struct block_device *bdev, int partno)
{
        struct block_device *bdevp;
        struct hd_struct *part = NULL;
        int ret;

        bdevp = bdget_disk(bdev->bd_disk, partno);
        if (!bdevp)
                return -ENOMEM; <--------------

...
}

struct block_device *bdget_disk(struct gendisk *disk, int partno)
{
        struct hd_struct *part;
        struct block_device *bdev = NULL;

        part = disk_get_part(disk, partno);
        if (part)
                bdev = bdget(part_devt(part));
        disk_put_part(part);

        return bdev;
}

struct block_device *bdget(dev_t dev)
{
        struct block_device *bdev;
        struct inode *inode;

        inode = iget5_locked(blockdev_superblock, hash(dev),
                        bdev_test, bdev_set, &dev);

        if (!inode)
                return NULL; <--------------------
[...]
}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-16 14:52           ` Danny Milosavljevic
  2020-09-16 15:02             ` Danny Milosavljevic
@ 2020-09-16 16:22             ` Leo Famulari
  2020-09-17  1:36               ` Mark H Weaver
  1 sibling, 1 reply; 17+ messages in thread
From: Leo Famulari @ 2020-09-16 16:22 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Sep 1 11:59:41 2020 +0200
> 
>     block: fix locking in bdev_del_partition
>     
>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>     
>     We need to hold the whole device bd_mutex to protect against
>     other thread concurrently deleting out partition before we get
>     to it, and thus causing a use after free.
>     
>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>     Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> ?

Do you think that's the faulty commit? I won't have time to test it
today.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-16 16:22             ` Leo Famulari
@ 2020-09-17  1:36               ` Mark H Weaver
  2020-09-17  2:21                 ` Leo Famulari
  2020-09-17 14:05                 ` Leo Famulari
  0 siblings, 2 replies; 17+ messages in thread
From: Mark H Weaver @ 2020-09-17  1:36 UTC (permalink / raw)
  To: Leo Famulari, Danny Milosavljevic; +Cc: Mathieu Othacehe, 43344

Leo Famulari <leo@famulari.name> writes:

> On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
>> commit 692d0626557451c4b557397f20b7394b612d0289
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Tue Sep 1 11:59:41 2020 +0200
>> 
>>     block: fix locking in bdev_del_partition
>>     
>>     [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>>     
>>     We need to hold the whole device bd_mutex to protect against
>>     other thread concurrently deleting out partition before we get
>>     to it, and thus causing a use after free.
>>     
>>     Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>>     Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>     Signed-off-by: Sasha Levin <sashal@kernel.org>
>> 
>> ?
>
> Do you think that's the faulty commit? I won't have time to test it
> today.

Looks like a good bet to me.  Many thanks to Leo for testing 5.8.9, and
to Danny for identifying the likely culprit.

Next, it probably makes sense to test 5.8.9 with the above commit
reverted.  Would someone like to try it?  If that works, we can avoid
the bisection, resume kernel updates, and revert this change across all
of our kernels until a better solution is found.

        Mark




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17  1:36               ` Mark H Weaver
@ 2020-09-17  2:21                 ` Leo Famulari
  2020-09-17  8:40                   ` Marius Bakke
  2020-09-17 14:05                 ` Leo Famulari
  1 sibling, 1 reply; 17+ messages in thread
From: Leo Famulari @ 2020-09-17  2:21 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Mathieu Othacehe, 43344

On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted.  Would someone like to try it?  If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

I'll do this tonight.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17  2:21                 ` Leo Famulari
@ 2020-09-17  8:40                   ` Marius Bakke
  2020-09-17 14:20                     ` Leo Famulari
  2020-09-17 19:08                     ` Mark H Weaver
  0 siblings, 2 replies; 17+ messages in thread
From: Marius Bakke @ 2020-09-17  8:40 UTC (permalink / raw)
  To: Leo Famulari, Mark H Weaver; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

Leo Famulari <leo@famulari.name> writes:

> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>> Next, it probably makes sense to test 5.8.9 with the above commit
>> reverted.  Would someone like to try it?  If that works, we can avoid
>> the bisection, resume kernel updates, and revert this change across all
>> of our kernels until a better solution is found.
>
> I'll do this tonight.

A fix for this is available in mainline:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a

(via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17  1:36               ` Mark H Weaver
  2020-09-17  2:21                 ` Leo Famulari
@ 2020-09-17 14:05                 ` Leo Famulari
  2020-09-17 15:28                   ` Danny Milosavljevic
  1 sibling, 1 reply; 17+ messages in thread
From: Leo Famulari @ 2020-09-17 14:05 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Mathieu Othacehe, 43344

On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> >> commit 692d0626557451c4b557397f20b7394b612d0289
> >> Author: Christoph Hellwig <hch@lst.de>
> >> Date:   Tue Sep 1 11:59:41 2020 +0200
> >> 
> >>     block: fix locking in bdev_del_partition

> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted.  Would someone like to try it?  If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
work again.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17  8:40                   ` Marius Bakke
@ 2020-09-17 14:20                     ` Leo Famulari
  2020-09-17 19:08                     ` Mark H Weaver
  1 sibling, 0 replies; 17+ messages in thread
From: Leo Famulari @ 2020-09-17 14:20 UTC (permalink / raw)
  To: Marius Bakke; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Thu, Sep 17, 2020 at 10:40:07AM +0200, Marius Bakke wrote:
> Leo Famulari <leo@famulari.name> writes:
> 
> > On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> >> Next, it probably makes sense to test 5.8.9 with the above commit
> >> reverted.  Would someone like to try it?  If that works, we can avoid
> >> the bisection, resume kernel updates, and revert this change across all
> >> of our kernels until a better solution is found.
> >
> > I'll do this tonight.
> 
> A fix for this is available in mainline:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
> 
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

What is the recommended way to report this to the stable maintainers?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17 14:05                 ` Leo Famulari
@ 2020-09-17 15:28                   ` Danny Milosavljevic
  0 siblings, 0 replies; 17+ messages in thread
From: Danny Milosavljevic @ 2020-09-17 15:28 UTC (permalink / raw)
  To: Leo Famulari; +Cc: Mathieu Othacehe, 43344

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

Hi Leo,

On Thu, 17 Sep 2020 10:05:44 -0400
Leo Famulari <leo@famulari.name> wrote:

> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> > > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:  
> > >> commit 692d0626557451c4b557397f20b7394b612d0289
> > >> Author: Christoph Hellwig <hch@lst.de>
> > >> Date:   Tue Sep 1 11:59:41 2020 +0200
> > >> 
> > >>     block: fix locking in bdev_del_partition  
> 
> > Next, it probably makes sense to test 5.8.9 with the above commit
> > reverted.  Would someone like to try it?  If that works, we can avoid
> > the bisection, resume kernel updates, and revert this change across all
> > of our kernels until a better solution is found.  
> 
> Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
> work again.

Thanks for testing it!

What the commit does is lock the drive whose partition table is being
modified.  That sounds like a good idea in general.  But parted does not
expect the error code ENOMEM and thus misjudges the situation.

See also:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a

for the fix that goes into Linux 5.9.

We could also patch parted to also accept ENOMEM in addition to ENXIO if we
wanted to.  I wouldn't (it diverges from what upstream is doing--too much work).

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#43344: "basic" system tests fail (and all the other ones) on guix master
  2020-09-17  8:40                   ` Marius Bakke
  2020-09-17 14:20                     ` Leo Famulari
@ 2020-09-17 19:08                     ` Mark H Weaver
  1 sibling, 0 replies; 17+ messages in thread
From: Mark H Weaver @ 2020-09-17 19:08 UTC (permalink / raw)
  To: Marius Bakke, Leo Famulari; +Cc: Mathieu Othacehe, 43344

Marius Bakke <marius@gnu.org> writes:

> Leo Famulari <leo@famulari.name> writes:
>
>> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>>> Next, it probably makes sense to test 5.8.9 with the above commit
>>> reverted.  Would someone like to try it?  If that works, we can avoid
>>> the bisection, resume kernel updates, and revert this change across all
>>> of our kernels until a better solution is found.
>>
>> I'll do this tonight.
>
> A fix for this is available in mainline:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
>
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

Thank you, Marius!  I see that this patch has been included in the
recently released linux-libre-5.8.10.  Assuming that it fixes our
problem (which I suspect it does), we can simply update our kernels and
close this bug.

Leo, would you like to update our kernels?  I won't be able to get to it
today, and I'm looking to offload that job anyway.

      Mark




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-17 19:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 17:50 bug#43344: "basic" system tests fail (and all the other ones) on guix master Danny Milosavljevic
2020-09-14 13:26 ` Mathieu Othacehe
2020-09-14 16:31   ` Danny Milosavljevic
2020-09-15 22:34     ` Mark H Weaver
2020-09-15 23:06       ` Leo Famulari
2020-09-16 14:12         ` Leo Famulari
2020-09-16 14:52           ` Danny Milosavljevic
2020-09-16 15:02             ` Danny Milosavljevic
2020-09-16 16:22             ` Leo Famulari
2020-09-17  1:36               ` Mark H Weaver
2020-09-17  2:21                 ` Leo Famulari
2020-09-17  8:40                   ` Marius Bakke
2020-09-17 14:20                     ` Leo Famulari
2020-09-17 19:08                     ` Mark H Weaver
2020-09-17 14:05                 ` Leo Famulari
2020-09-17 15:28                   ` Danny Milosavljevic
2020-09-16 13:59 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).