unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#68595: VLANs in static-networking-service-type hangs shepherd
@ 2024-01-19 19:12 Lars Rustand
  2024-01-19 23:32 ` Lars Rustand
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Lars Rustand @ 2024-01-19 19:12 UTC (permalink / raw)
  To: 68595


Like the title says, if you add any VLAN in a
static-networking-service-type it seems like the whole shepherd daemon
freezes up and anything that depends on it stops responding.
Additionally the networking does not get fully configured either.

After configuring a VLAN `herd status`, `herd restart networking` and
any other herd command hangs forever with no output. Even reboot is not
working. The only remedy is to restart the system using the power
button, but even after the restart the networking service still fails to
start.

VLANs are seemingly created, but no addresses are created.

Steps to reproduce:

1. Add a static network with a VLAN to your system config (see below for
minimal example)
2. Reconfigure your system
3. Restart the networking service with `sudo herd restart networking`
4. Observe that herd does not finish
5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
6. Observe that none of the commands seem to have any effect, and that
they hang indefinitely with no output

--8<---------------cut here---------------start------------->8---
(service static-networking-service-type
  (list (static-networking
         (links
          (list (network-link
                 (name "myvlan")
                 (type 'vlan)
                 (arguments '((id . 3)
                              (link . "eth0"))))))
         (addresses
          (list (network-address
                 (device "myvlan@eth0")
                 (value "192.168.0.2/24")))))))
--8<---------------cut here---------------end--------------->8---

Alternatively here are the reproduction steps using VM:

1. Build a qcow2 image, make sure there is enough space to reconfigure
   the system. Use --save-provenance so you have the config inside the
   vm so you can reconfigure later.
   `guix system image --image-type=qcow2 --image-size=30G --save-provenance minimal.scm`
2. Copy the qcow image to a writable directory.
3. Start up the vm.
```
sudo qemu-system-x86_64 \
   -nic user,model=virtio-net-pci \
   -enable-kvm -m 2048 \
   -device virtio-blk,drive=myhd \
   -drive
   if=none,file=1a7wi5mgcy3wrsx6pcnag6qjbb87djwl-image.qcow2,id=myhd
```
4. Edit /run/current-system/configuration.scm and uncomment the static
   networking.
5. Reconfigure the system.
6. Try to restart the networking service. `herd restart networking`
7. The command will hang infinitely. Cancel it.
8. Check the network interfaces. The VLAN interface will have been
   created, but it will not have any address.
9. The aforementioned commands will all be unresponsive now.
10. If you reboot your VM you will see that the networking service is
   failed at startup, and if you try to restart the service you will get
   an error: #<&netlink-response-error errno: 17>

--8<---------------cut here---------------start------------->8---
(use-modules
  (gnu)
  (gnu services)
  (gnu services base)
  (gnu services networking)
  (gnu bootloader)
  (gnu bootloader grub)
  (gnu system)
  (gnu system file-systems)
  (gnu system accounts))

(operating-system
  (host-name "minimal")

  (users
    (cons*
      (user-account
        (name "lars")
        (group "users"))
      %base-user-accounts))

  (services
   (cons*
          (service dhcp-client-service-type)
          ;; Commented out so you can uncomment it after booting the VM
          ;;(service static-networking-service-type
          ;;      (list (static-networking
          ;;             (links
          ;;              (list (network-link
          ;;                     (name "myvlan")
          ;;                     (type 'vlan)
          ;;                     (arguments '((id . 3)
          ;;                                  (link . "eth0"))))))
          ;;             (addresses
          ;;              (list (network-address
          ;;                     (device "myvlan@eth0")
          ;;                     (value "192.168.0.2/24")))))))
    %base-services))

   (bootloader
     (bootloader-configuration
       (bootloader grub-bootloader)
       (targets '("/dev/vda"))))

   (file-systems
    (cons*
     %base-file-systems)))
--8<---------------cut here---------------end--------------->8---




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#68595: VLANs in static-networking-service-type hangs shepherd
  2024-01-19 19:12 bug#68595: VLANs in static-networking-service-type hangs shepherd Lars Rustand
@ 2024-01-19 23:32 ` Lars Rustand
  2024-02-12  9:55 ` Ludovic Courtès
  2024-02-12 11:59 ` Alexey Abramov via Bug reports for GNU Guix
  2 siblings, 0 replies; 5+ messages in thread
From: Lars Rustand @ 2024-01-19 23:32 UTC (permalink / raw)
  To: 68595


For fun I tried to use the exact configuration that is mentioned in the
manual and was amazed that it worked, and the networking service is able
to start successfully. Here is the working configuration:

--8<---------------cut here---------------start------------->8---
(static-networking
 (links (list (network-link
               (name "bond0")
               (type 'bond)
               (arguments '((mode . "802.3ad")
                            (miimon . 100)
                            (lacp-active . "on")
                            (lacp-rate . "fast"))))

              (network-link
               (mac-address "98:11:22:33:44:55")
               (arguments '((master . "bond0"))))

              (network-link
               (mac-address "98:11:22:33:44:56")
               (arguments '((master . "bond0"))))

              (network-link
               (name "bond0.1055")
               (type 'vlan)
               (arguments '((id . 1055)
                            (link . "bond0"))))))
 (addresses (list (network-address
                   (value "192.168.1.4/24")
--8<---------------cut here---------------end--------------->8---


However, if I simply substitute the bond interface with a real interface
I get back the error described in my previous message. This
configuration fails:

--8<---------------cut here---------------start------------->8---
(static-networking
 (links (list (network-link
               (name "bond0.1055")
               (type 'vlan)
               (arguments '((id . 1055)
                            (link . "ens3"))))))
 (addresses (list (network-address
                   (value "192.168.1.4/24")
                   (device "bond0.1055")))))
--8<---------------cut here---------------end--------------->8---


So it seems that VLANs do work for bonds, but not for physical network
interfaces. I've done a lot of digging on the internet and cannot find a
single example of anyone using VLANs at all in Guix, so maybe that is
why this problem hasn't been discovered yet.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#68595: VLANs in static-networking-service-type hangs shepherd
  2024-01-19 19:12 bug#68595: VLANs in static-networking-service-type hangs shepherd Lars Rustand
  2024-01-19 23:32 ` Lars Rustand
@ 2024-02-12  9:55 ` Ludovic Courtès
  2024-02-15  9:07   ` Lars Rustand
  2024-02-12 11:59 ` Alexey Abramov via Bug reports for GNU Guix
  2 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2024-02-12  9:55 UTC (permalink / raw)
  To: Lars Rustand; +Cc: 68595, Julien Lepiller, Alexey Abramov

Hi,

Lars Rustand <rustand.lars@gmail.com> skribis:

> Like the title says, if you add any VLAN in a
> static-networking-service-type it seems like the whole shepherd daemon
> freezes up and anything that depends on it stops responding.
> Additionally the networking does not get fully configured either.
>
> After configuring a VLAN `herd status`, `herd restart networking` and
> any other herd command hangs forever with no output. Even reboot is not
> working. The only remedy is to restart the system using the power
> button, but even after the restart the networking service still fails to
> start.

Ouch.  Could you check what /var/log/messages reports?

Once you’ve reproduced the hang, could you attach GDB to shepherd and
get a backtrace?

  gdb -p 1
  bt

(I recommend doing that in a VM rather than on your main machine!)

> 1. Add a static network with a VLAN to your system config (see below for
> minimal example)
> 2. Reconfigure your system
> 3. Restart the networking service with `sudo herd restart networking`
> 4. Observe that herd does not finish
> 5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
> 6. Observe that none of the commands seem to have any effect, and that
> they hang indefinitely with no output
>
> (service static-networking-service-type
>   (list (static-networking
>          (links
>           (list (network-link
>                  (name "myvlan")
>                  (type 'vlan)
>                  (arguments '((id . 3)
>                               (link . "eth0"))))))
>          (addresses
>           (list (network-address
>                  (device "myvlan@eth0")
>                  (value "192.168.0.2/24")))))))

You mentioned in your other message that the example from the manual
works fine.  Could you try and reduce your config until you find which
bit makes it fail?

Cc’ing Alexey and Julien who may know more.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#68595: VLANs in static-networking-service-type hangs shepherd
  2024-01-19 19:12 bug#68595: VLANs in static-networking-service-type hangs shepherd Lars Rustand
  2024-01-19 23:32 ` Lars Rustand
  2024-02-12  9:55 ` Ludovic Courtès
@ 2024-02-12 11:59 ` Alexey Abramov via Bug reports for GNU Guix
  2 siblings, 0 replies; 5+ messages in thread
From: Alexey Abramov via Bug reports for GNU Guix @ 2024-02-12 11:59 UTC (permalink / raw)
  To: Lars Rustand; +Cc: 68595

Hi Lars,

Lars Rustand <rustand.lars@gmail.com> writes:

> Like the title says, if you add any VLAN in a
> static-networking-service-type it seems like the whole shepherd daemon
> freezes up and anything that depends on it stops responding.
> Additionally the networking does not get fully configured either.
>
> After configuring a VLAN `herd status`, `herd restart networking` and
> any other herd command hangs forever with no output. Even reboot is not
> working. The only remedy is to restart the system using the power
> button, but even after the restart the networking service still fails to
> start.
>
> VLANs are seemingly created, but no addresses are created.
>
> Steps to reproduce:
>
> 1. Add a static network with a VLAN to your system config (see below for
> minimal example)
> 2. Reconfigure your system
> 3. Restart the networking service with `sudo herd restart networking`
> 4. Observe that herd does not finish
> 5. Try to run `herd status`, `guix system reconfigure`, or `sudo reboot`.
> 6. Observe that none of the commands seem to have any effect, and that
> they hang indefinitely with no output
>
> --8<---------------cut here---------------start------------->8---
> (service static-networking-service-type
>   (list (static-networking
>          (links
>           (list (network-link
>                  (name "myvlan")
>                  (type 'vlan)
>                  (arguments '((id . 3)
>                               (link . "eth0"))))))
>          (addresses
>           (list (network-address
>                  (device "myvlan@eth0")
>                  (value "192.168.0.2/24")))))))
> --8<---------------cut here---------------end--------------->8---

I see, Could you please, replace the device name to "myvlan" and not
"myvlan@eth0" in the network-address.

Even though ip link (iproute2) shows you 'myvlan@eth0' this is not an
actual name of the interfaces.

> Alternatively here are the reproduction steps using VM:
>
> 1. Build a qcow2 image, make sure there is enough space to reconfigure
>    the system. Use --save-provenance so you have the config inside the
>    vm so you can reconfigure later.
>    `guix system image --image-type=qcow2 --image-size=30G --save-provenance minimal.scm`
> 2. Copy the qcow image to a writable directory.
> 3. Start up the vm.
> ```
> sudo qemu-system-x86_64 \
>    -nic user,model=virtio-net-pci \
>    -enable-kvm -m 2048 \
>    -device virtio-blk,drive=myhd \
>    -drive
>    if=none,file=1a7wi5mgcy3wrsx6pcnag6qjbb87djwl-image.qcow2,id=myhd
> ```
> 4. Edit /run/current-system/configuration.scm and uncomment the static
>    networking.
> 5. Reconfigure the system.
> 6. Try to restart the networking service. `herd restart networking`
> 7. The command will hang infinitely. Cancel it.
> 8. Check the network interfaces. The VLAN interface will have been
>    created, but it will not have any address.
> 9. The aforementioned commands will all be unresponsive now.
> 10. If you reboot your VM you will see that the networking service is
>    failed at startup, and if you try to restart the service you will get
>    an error: #<&netlink-response-error errno: 17>
>

We need to improve our error messaging. This means that the
interface is exist. 

-- 
Alexey




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#68595: VLANs in static-networking-service-type hangs shepherd
  2024-02-12  9:55 ` Ludovic Courtès
@ 2024-02-15  9:07   ` Lars Rustand
  0 siblings, 0 replies; 5+ messages in thread
From: Lars Rustand @ 2024-02-15  9:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 68595, Julien Lepiller, Alexey Abramov


Ludovic Courtès <ludo@gnu.org> writes:

> Ouch.  Could you check what /var/log/messages reports?
>
> Once you’ve reproduced the hang, could you attach GDB to shepherd and
> get a backtrace?
>
>   gdb -p 1
>   bt
>
> (I recommend doing that in a VM rather than on your main machine!)
>

I have unfortunately been unable to reproduce the full shepherd hang,
even though I have followed the exact same procedure as before. I still
experience that the command `herd restart networking` hangs indefinitely
the first time after adding a VLAN, but now this has not triggered the
whole shepherd to hang afterwards anymore.

The basic error 17 still comes any time I try to start networking
service while having a VLAN configured.


>
> You mentioned in your other message that the example from the manual
> works fine.  Could you try and reduce your config until you find which
> bit makes it fail?

The configuration I have already attached is as minimal as it is
possible. It only includes the mandatory OS fields and a minimal
static-networking-configuration.

I have already found which bit makes it fail. It is the use of VLAN for
any normal network link. VLANs seem to only work for bond devices as in
the example.

The reproduction steps are maybe a little over-complicated however, and
are only necessary in order to reproduce the full "shepherd hangs" bug,
which I now am unable to reproduce anyway. But what I believe is the
root of the problem is the error 17 on starting the networking
service. This can be reproduced much more simply and reliably by just
starting a VM the normal way with the static-networking snippet already
enabled when building it.

So here are the new simplified reproduction steps for reproducing only
the error 17 and unfunctional VLAN networking:

Use the OS config from my first post, but uncomment the static
networking block. Build and run the VM with `$(guix system vm minimal.scm)`.

That's it.

> Cc’ing Alexey and Julien who may know more.
>
> Thanks,
> Ludo’.



Alexey Abramov <levenson@mmer.org> writes:

> Hi Lars,
>
> I see, Could you please, replace the device name to "myvlan" and not
> "myvlan@eth0" in the network-address.
>
> Even though ip link (iproute2) shows you 'myvlan@eth0' this is not an
> actual name of the interfaces.
>

I have tried with your suggestion, but everything behaves exactly the same.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-15  9:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-19 19:12 bug#68595: VLANs in static-networking-service-type hangs shepherd Lars Rustand
2024-01-19 23:32 ` Lars Rustand
2024-02-12  9:55 ` Ludovic Courtès
2024-02-15  9:07   ` Lars Rustand
2024-02-12 11:59 ` Alexey Abramov via Bug reports for GNU Guix

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).