From: Giovanni Biscuolo <g@xelera.eu>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org, Julien Lepiller <julien@lepiller.eu>
Subject: Re: networking service not starting with netlink-response-error errno:17
Date: Mon, 17 Jun 2024 17:12:15 +0200 [thread overview]
Message-ID: <87jzind2nk.fsf@xelera.eu> (raw)
In-Reply-To: <87cyof7lfo.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 6096 bytes --]
Hi Ludovic,
executive summary: it is (was) a "network architecture" mistake by my
side, since I was mixing a device with static-network defined via guix
with a bridge defined via libvirt... and this is not good. The more I
think about it the more I'm convinced that trying to add a route for
device "swws-bridge" (see below) in the "eno1" [1] static-networking
declaration is simply a... mistake.
Julien I'm adidng you in Cc: only because you develop guile-netlink and
maybe you could see if it's possible to improve netlink related error
messages.
Ludovic Courtès <ludo@gnu.org> writes:
> Giovanni Biscuolo <g@xelera.eu> skribis:
>
>> after a reboot on a running remote host (it was running since several
>> guix system generations ago... but with no reboots meanwhile) I get a
>> failing networking service and consequently the ssh service (et al)
>> refuses to start :-(
>>
>> Sorry I've no text to show you but a screenshot (see attachment below)
>> because I'm connecting with a remote KVM console appliance.
In a follow-up message I was then able to copy the actual error message:
--8<---------------cut here---------------start------------->8---
Jun 14 11:28:32 localhost vmunix: [ 6.258520] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [ 6.472949] shepherd[1]: Service networking failed to
start.
Jun 14 11:28:32 localhost vmunix: [ 6.474842] shepherd[1]: Exception caught while
starting networking: (no-such-device "swws-bridge")
Jun 14 11:28:32 localhost vmunix: [ 6.492344] shepherd[1]: Starting service
networking...
Jun 14 11:28:32 localhost vmunix: [ 6.509652] shepherd[1]: Exception caught while
starting networking: (%exception #<&netlink-response-error errno: 17>)
Jun 14 11:28:32 localhost vmunix: [ 6.510034] shepherd[1]: Service networking failed to
start.
--8<---------------cut here---------------end--------------->8---
Then (in the same message) I described how I was able to solve my issue,
this is the "core" of my configuration _mistake:_
--8<---------------cut here---------------start------------->8---
(service static-networking-service-type
(list (static-networking
(addresses (list (network-address
(device ane-wan-device)
(value (string-append ane-wan-ip4 "/24")))))
(routes (list (network-route
(destination "default")
(gateway ane-wan-gateway))))
;; ip route add 10.1.2.0/24 dev swws-bridge via 192.168.133.12
;; (network-route
;; (destination "10.1.2.0/24") ;; lxcbr0 net
;; (device swws-bridge-name)
;; (gateway "192.168.133.12")))) ;; on node002
(name-servers '("185.12.64.1"
"185.12.64.1")))))
--8<---------------cut here---------------end--------------->8---
I commented out the second network-route definition, the one using
"swws-bridge" [1] as device to route to 10.1.2.0/24 via 192.168.133.12.
When I used that code, AFAIU the first time shepherd was trying to start
the networking service, failing because "swws-bridge" is missing and
(guile-)netlink fails with "no-such-device", then it tries again but
fails because the very same route is already defined (but not
functional).
A failing networking service (although the interface is up and running)
means that ssh (et al) fails to start, because networking is a ssh
requisite.
> 17 = EEXIST, which is netlink’s way of saying that the device/route/link
> it’s trying to add already exists.
Ah thanks! I was not able to find that error code.
When run on the command line I get:
--8<---------------cut here---------------start------------->8---
g@ane ~$ sudo ip route add 10.1.2.0/24 dev swws-bridge via 192.168.133.12
RTNETLINK answers: File exists
--8<---------------cut here---------------end--------------->8---
Is it possible to have the same error and/or little bit of context in
syslog when this happens with 'network-set-up/linux'
Anyway, I think that "ip route" should just be idempotent... but maybe
I'm missing something. (and this is obviously not a downstream issue)
> The problem here is that static networking adds devices, routes, and
> links (see ‘network-set-up/linux’ in the code). If it fails in the
> middle, then it may have added devices without adding routes, so you end
> up with half-configured networking. Ideally this would be
> transactional.
Well, actually it would be a pity to fail a whole static-networking
"just" for a failing /secondary/ route, no?
But as I told in the "executive summary", how could I /dare/ to
declaratively add (with Guix System) a similar route for "swws-bridge"
when "swws-bridge" is managed by libvirt?
I should simply use libvirt to add that! :-)
https://libvirt.org/formatnetwork.html#static-routes
> When that happens, you need to check the logs and use the ‘ip’ command
> to figure out which part failed exactly. In your case, the root problem
> seems to be that “swws-bridge” did not exist.
Yes, I can confirm this
> Then you can (1) manually fix it with ‘ip’, and (2) adjust your Guix
> System config to fix the problems you found.
>
> This is inconvenient at best. I would be interested in hearing
> suggestions on how to improve on this.
Oh well, for my use-case I don't think there is anything to improve:
I just have to keep the "eno1" device configuration _separate_ from the
"swws-bridge" one (even if "swws-bridge" was defined via static-network
and not libvirt).
The only suggestion I have is to add a more "user friendly" error
messages in syslog for netlink-related errors, it wold have helped me
more to read "adding route, RTNETLINK answers: File exists" than
"netlink-response-error errno: 17"
Thank you and... happy hacking! Gio'
[1] swws-bridge-name is defined as "swws-bridge"
ane-wan-device is defined as "eno1"
--
Giovanni Biscuolo
Xelera IT Infrastructures
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]
prev parent reply other threads:[~2024-06-17 15:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-14 11:04 networking service not starting with netlink-response-error errno:17 Giovanni Biscuolo
2024-06-14 13:07 ` networking service not starting for a network-route setting (was for network with netlink-response-error errno:17) Giovanni Biscuolo
2024-06-17 13:23 ` networking service not starting with netlink-response-error errno:17 Ludovic Courtès
2024-06-17 15:12 ` Giovanni Biscuolo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzind2nk.fsf@xelera.eu \
--to=g@xelera.eu \
--cc=guix-devel@gnu.org \
--cc=julien@lepiller.eu \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).