unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Giovanni Biscuolo <g@xelera.eu>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org, Julien Lepiller <julien@lepiller.eu>
Subject: Re: networking service not starting with netlink-response-error errno:17
Date: Mon, 17 Jun 2024 17:12:15 +0200	[thread overview]
Message-ID: <87jzind2nk.fsf@xelera.eu> (raw)
In-Reply-To: <87cyof7lfo.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 6096 bytes --]

Hi Ludovic,

executive summary: it is (was) a "network architecture" mistake by my
side, since I was mixing a device with static-network defined via guix
with a bridge defined via libvirt... and this is not good.  The more I
think about it the more I'm convinced that trying to add a route for
device "swws-bridge" (see below) in the "eno1" [1] static-networking
declaration is simply a... mistake.

Julien I'm adidng you in Cc: only because you develop guile-netlink and
maybe you could see if it's possible to improve netlink related error
messages.

Ludovic Courtès <ludo@gnu.org> writes:

> Giovanni Biscuolo <g@xelera.eu> skribis:
>
>> after a reboot on a running remote host (it was running since several
>> guix system generations ago... but with no reboots meanwhile) I get a
>> failing networking service and consequently the ssh service (et al)
>> refuses to start :-(
>>
>> Sorry I've no text to show you but a screenshot (see attachment below)
>> because I'm connecting with a remote KVM console appliance.

In a follow-up message I was then able to copy the actual error message:

--8<---------------cut here---------------start------------->8---

Jun 14 11:28:32 localhost vmunix: [    6.258520] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [    6.472949] shepherd[1]: Service networking failed to
start.
Jun 14 11:28:32 localhost vmunix: [    6.474842] shepherd[1]: Exception caught while
starting networking: (no-such-device "swws-bridge")
Jun 14 11:28:32 localhost vmunix: [    6.492344] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [    6.509652] shepherd[1]: Exception caught while
starting networking: (%exception #<&netlink-response-error errno: 17>)
Jun 14 11:28:32 localhost vmunix: [    6.510034] shepherd[1]: Service networking failed to
start.

--8<---------------cut here---------------end--------------->8---

Then (in the same message) I described how I was able to solve my issue,
this is the "core" of my configuration _mistake:_

--8<---------------cut here---------------start------------->8---

            (service static-networking-service-type
        	     (list (static-networking
        		    (addresses (list (network-address
        				      (device ane-wan-device)
        				      (value (string-append ane-wan-ip4 "/24")))))
        		    (routes (list (network-route
        				   (destination "default")
        				   (gateway ane-wan-gateway))))
					  ;; ip route add 10.1.2.0/24 dev swws-bridge via 192.168.133.12
					  ;; (network-route
					  ;;  (destination "10.1.2.0/24")   ;; lxcbr0 net
					  ;;  (device swws-bridge-name)
					  ;;  (gateway "192.168.133.12")))) ;; on node002
        		    (name-servers '("185.12.64.1"
        				    "185.12.64.1")))))

--8<---------------cut here---------------end--------------->8---

I commented out the second network-route definition, the one using
"swws-bridge" [1] as device to route to 10.1.2.0/24 via 192.168.133.12.

When I used that code, AFAIU the first time shepherd was trying to start
the networking service, failing because "swws-bridge" is missing and
(guile-)netlink fails with "no-such-device", then it tries again but
fails because the very same route is already defined (but not
functional).

A failing networking service (although the interface is up and running)
means that ssh (et al) fails to start, because networking is a ssh
requisite.

> 17 = EEXIST, which is netlink’s way of saying that the device/route/link
> it’s trying to add already exists.

Ah thanks!  I was not able to find that error code.

When run on the command line I get:

--8<---------------cut here---------------start------------->8---

g@ane ~$ sudo ip route add 10.1.2.0/24 dev swws-bridge via 192.168.133.12
RTNETLINK answers: File exists

--8<---------------cut here---------------end--------------->8---

Is it possible to have the same error and/or little bit of context in
syslog when this happens with 'network-set-up/linux'

Anyway, I think that "ip route" should just be idempotent... but maybe
I'm missing something. (and this is obviously not a downstream issue)

> The problem here is that static networking adds devices, routes, and
> links (see ‘network-set-up/linux’ in the code).  If it fails in the
> middle, then it may have added devices without adding routes, so you end
> up with half-configured networking.  Ideally this would be
> transactional.

Well, actually it would be a pity to fail a whole static-networking
"just" for a failing /secondary/ route, no?

But as I told in the "executive summary", how could I /dare/ to
declaratively add (with Guix System) a similar route for "swws-bridge"
when "swws-bridge" is managed by libvirt?

I should simply use libvirt to add that! :-)
https://libvirt.org/formatnetwork.html#static-routes

> When that happens, you need to check the logs and use the ‘ip’ command
> to figure out which part failed exactly.  In your case, the root problem
> seems to be that “swws-bridge” did not exist.

Yes, I can confirm this

> Then you can (1) manually fix it with ‘ip’, and (2) adjust your Guix
> System config to fix the problems you found.
>
> This is inconvenient at best.  I would be interested in hearing
> suggestions on how to improve on this.

Oh well, for my use-case I don't think there is anything to improve:
I just have to keep the "eno1" device configuration _separate_ from the
"swws-bridge" one (even if "swws-bridge" was defined via static-network
and not libvirt).

The only suggestion I have is to add a more "user friendly" error
messages in syslog for netlink-related errors, it wold have helped me
more to read "adding route, RTNETLINK answers: File exists" than
"netlink-response-error errno: 17"

Thank you and... happy hacking! Gio'


[1] swws-bridge-name is defined as "swws-bridge"
    ane-wan-device is defined as "eno1"    

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

      reply	other threads:[~2024-06-17 15:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-14 11:04 networking service not starting with netlink-response-error errno:17 Giovanni Biscuolo
2024-06-14 13:07 ` networking service not starting for a network-route setting (was for network with netlink-response-error errno:17) Giovanni Biscuolo
2024-06-17 13:23 ` networking service not starting with netlink-response-error errno:17 Ludovic Courtès
2024-06-17 15:12   ` Giovanni Biscuolo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzind2nk.fsf@xelera.eu \
    --to=g@xelera.eu \
    --cc=guix-devel@gnu.org \
    --cc=julien@lepiller.eu \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).