unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#64653: ‘static-networking’ fails to start
@ 2023-07-15 20:04 Ludovic Courtès
  2023-09-17 16:42 ` bug#64653: stopping ntp and dnsmasq Matt Wette
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-07-15 20:04 UTC (permalink / raw)
  To: 64653

Hi!

On the machine that exhibited <https://issues.guix.gnu.org/63516>, I’m
now seeing this, with the fix from commit
26602f4063a6e0c626e8deb3423166bcd0abeb90:

--8<---------------cut here---------------start------------->8---
[  121.017522] shepherd[1]: Starting service user-homes...
[  121.049038] tg3 0000:05:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b8:cb:29:b5:1c:3a
[  121.049042] tg3 0000:05:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[  121.049044] tg3 0000:05:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[  121.049045] tg3 0000:05:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[  121.084342] tg3 0000:05:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b8:cb:29:b5:1c:3b
[  121.084355] tg3 0000:05:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[  121.084363] tg3 0000:05:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[  121.084370] tg3 0000:05:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[  121.102367] iTCO_vendor_support: vendor-support=0
[  121.103831] Error: Driver 'pcspkr' is already registered, aborting...
[  121.108617] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)
[  121.113037] tg3 0000:05:00.1 eno2: renamed from eth1

[...]

[  121.281600] shepherd[1]: Service user-homes has been started.
[  121.282538] shepherd[1]: Service user-homes started.
[  121.368316] ipmi_si IPI0001:00: Using irq 10
[  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
[  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
[  121.420074] shepherd[1]: Service user-homes running with value #t.
[  121.420218] shepherd[1]: Service networking failed to start.
--8<---------------cut here---------------end--------------->8---

The failure seems to happen after the whole static networking config has
been set up though (‘ip a’ shows that everything’s in place).

Problem is that at this point ‘networking’ cannot be started unless you
manually tear down everything with ‘ip’:

--8<---------------cut here---------------start------------->8---
$ sudo herd start networking
herd: error: exception rattrapée pendant l’exécution de « start » sur le service « networking » :
Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.
--8<---------------cut here---------------end--------------->8---

(17 = EEXIST)

This makes me think we should make the set up phase idempotent or,
alternatively, add special actions to force a change.

Thoughts?

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: stopping ntp and dnsmasq
  2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
@ 2023-09-17 16:42 ` Matt Wette
  2023-09-17 17:09   ` Matt Wette
  2023-10-02 11:59 ` bug#64653: ‘static-networking’ fails to start Ludovic Courtès
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Matt Wette @ 2023-09-17 16:42 UTC (permalink / raw)
  To: 64653

Are there any workarounds for this.   I've been digging into anything to 
help.
I'm dead in the water trying to get ntpd and tftpd (dnsmasq) working.  
They require this.
Or, is there a way to get dnsmasq working itself?

Matt





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: stopping ntp and dnsmasq
  2023-09-17 16:42 ` bug#64653: stopping ntp and dnsmasq Matt Wette
@ 2023-09-17 17:09   ` Matt Wette
  0 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2023-09-17 17:09 UTC (permalink / raw)
  To: 64653

On 9/17/23 9:42 AM, Matt Wette wrote:
> Are there any workarounds for this.   I've been digging into anything 
> to help.
> I'm dead in the water trying to get ntpd and tftpd (dnsmasq) working.  
> They require this.
> Or, is there a way to get dnsmasq working itself?

I see there is atftp, so I'll try that.   Still no working ntpd.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: ‘static-networking’ fails to start
  2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
  2023-09-17 16:42 ` bug#64653: stopping ntp and dnsmasq Matt Wette
@ 2023-10-02 11:59 ` Ludovic Courtès
  2023-11-11 16:25 ` Leo Nikkilä via Bug reports for GNU Guix
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-02 11:59 UTC (permalink / raw)
  To: 64653; +Cc: Christopher Baines, Matt Wette

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> [  121.281600] shepherd[1]: Service user-homes has been started.
> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.
>
>
> The failure seems to happen after the whole static networking config has
> been set up though (‘ip a’ shows that everything’s in place).
>
> Problem is that at this point ‘networking’ cannot be started unless you
> manually tear down everything with ‘ip’:
>
> $ sudo herd start networking
> herd: error: exception rattrapée pendant l’exécution de « start » sur le service « networking » :
> Throw to key `%exception' with args `("#<&netlink-response-error errno: 17>")'.

Quick workaround if you encounter this bug:

  1. Find the “tear-down” script of your system with:

       guix gc -R /run/current-system |grep tear-down-network

  2. In a ‘screen’ session, run this as root:

       while true ; do herd enable networking; herd start networking; sleep 3; done

  3. Run:

       sudo guile --no-auto-compile TEAR_DOWN_SCRIPT_FROM_STEP_1

Beautiful, isn’t it?

(We’ll actually work on fixing the bug, too…)

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: ‘static-networking’ fails to start
  2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
  2023-09-17 16:42 ` bug#64653: stopping ntp and dnsmasq Matt Wette
  2023-10-02 11:59 ` bug#64653: ‘static-networking’ fails to start Ludovic Courtès
@ 2023-11-11 16:25 ` Leo Nikkilä via Bug reports for GNU Guix
  2024-01-03 23:42 ` Ludovic Courtès
  2024-01-20 21:14 ` bug#64653: works now Matt Wette
  4 siblings, 0 replies; 8+ messages in thread
From: Leo Nikkilä via Bug reports for GNU Guix @ 2023-11-11 16:25 UTC (permalink / raw)
  To: 64653

I'm also seeing this issue on a headless RockPro64 system. Do you know anything I could change in the configuration to work around this during boot, e.g. patch a specific commit out?

Happy to provide further details or test things on my system.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: ‘static-networking’ fails to start
  2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
                   ` (2 preceding siblings ...)
  2023-11-11 16:25 ` Leo Nikkilä via Bug reports for GNU Guix
@ 2024-01-03 23:42 ` Ludovic Courtès
  2024-01-05 16:32   ` Ludovic Courtès
  2024-01-20 21:14 ` bug#64653: works now Matt Wette
  4 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2024-01-03 23:42 UTC (permalink / raw)
  To: 64653

Hello!

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> [  121.282538] shepherd[1]: Service user-homes started.
> [  121.368316] ipmi_si IPI0001:00: Using irq 10
> [  121.405790] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
> [  121.419871] shepherd[1]: Exception caught while starting #<<service> 7f19889012a0>: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 7f1981887000>) (#<closed: file 7f1981887000>))
> [  121.420074] shepherd[1]: Service user-homes running with value #t.
> [  121.420218] shepherd[1]: Service networking failed to start.

I’m seeing a similar exception in a Hurd VM running shepherd 0.10.3rc1:

--8<---------------cut here---------------start------------->8---
Jan  3 23:13:22 localhost shepherd[1]: Exception caught while starting networking: (wrong-type-arg "port-filename" "Wrong type argument in position ~A: ~S" (1 #<closed: file 207e498>) (#<closed: file 207e498>)) 
Jan  3 23:13:22 localhost shepherd[1]: Service networking failed to start. 
--8<---------------cut here---------------end--------------->8---

It’s interesting because it suggests that the offending ‘port-filename’
call comes from ‘load’, not from the network-setup code being loaded
(here, the /hurd/pfinet translator has been properly set up).

Looking at the code in ‘boot-9.scm’, I *think* we end up calling
‘primitive-load’; ‘shepherd’ replaces it with its own (@ (shepherd
support) primitive-load*).

I managed to grab this backtrace:

--8<---------------cut here---------------start------------->8---
Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
starting '/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet "--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" "10.0.2.15" "--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
In ice-9/boot-9.scm:
    142:2  7 (dynamic-wind #<procedure 20393a0 at ice-9/eval.scm:33?> ?)
In shepherd/support.scm:
   486:15  6 (_ #<closed: file 50a7e38>)
In ice-9/read.scm:
   859:19  5 (read _)
In unknown file:
           4 (port-filename #<closed: file 50a7e38>)
In ice-9/boot-9.scm:
  1685:16  3 (raise-exception _ #:continuable? _)
  1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
In ice-9/eval.scm:
    159:9  1 (_ #(#(#<module (#{ g171}#) 3cd25f0>) (# "port-fil?" ?)))
In unknown file:
           0 (make-stack #t)
#t
--8<---------------cut here---------------end--------------->8---

So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
upon a closed port.  It also happens when loading a file that simply
suspends the current fiber via ‘sleep’ or similar, but only on the Hurd
though.

To be continued…

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: ‘static-networking’ fails to start
  2024-01-03 23:42 ` Ludovic Courtès
@ 2024-01-05 16:32   ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2024-01-05 16:32 UTC (permalink / raw)
  To: 64653-done

Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

> Evaluating user expression (catch #t (lambda () (load "/gnu/store/64?")) # ?).
> starting '/gnu/store/gn8q7p790a9zdnlciyp1vlncpin366r0-hurd-v0.9.git20230216/hurd/pfinet "--ipv6" "/servers/socket/26" "--interface" "/dev/eth0" "--address" "10.0.2.15" "--netmask" "255.255.255.0" "--gateway" "10.0.2.2"'
> In ice-9/boot-9.scm:
>     142:2  7 (dynamic-wind #<procedure 20393a0 at ice-9/eval.scm:33?> ?)
> In shepherd/support.scm:
>    486:15  6 (_ #<closed: file 50a7e38>)
> In ice-9/read.scm:
>    859:19  5 (read _)
> In unknown file:
>            4 (port-filename #<closed: file 50a7e38>)
> In ice-9/boot-9.scm:
>   1685:16  3 (raise-exception _ #:continuable? _)
>   1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail?>)
> In ice-9/eval.scm:
>     159:9  1 (_ #(#(#<module (#{ g171}#) 3cd25f0>) (# "port-fil?" ?)))
> In unknown file:
>            0 (make-stack #t)
> #t
>
> So it’s indeed ‘read’ as called from ‘primitive-load*’ that stumbles
> upon a closed port.

Good news: this is fixed by 4e431fda5f2ec76b6d6a271be7c30b1324431329!
Silly me had introduced a ‘dynamic-wind’ there.

(The funny thing with extensible systems like the Shepherd is that the
problem can be anywhere.  :-))

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#64653: works now
  2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
                   ` (3 preceding siblings ...)
  2024-01-03 23:42 ` Ludovic Courtès
@ 2024-01-20 21:14 ` Matt Wette
  4 siblings, 0 replies; 8+ messages in thread
From: Matt Wette @ 2024-01-20 21:14 UTC (permalink / raw)
  To: 64653

This bug no longer occurs on my system.   That change occurred over the 
last week.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-20 21:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-15 20:04 bug#64653: ‘static-networking’ fails to start Ludovic Courtès
2023-09-17 16:42 ` bug#64653: stopping ntp and dnsmasq Matt Wette
2023-09-17 17:09   ` Matt Wette
2023-10-02 11:59 ` bug#64653: ‘static-networking’ fails to start Ludovic Courtès
2023-11-11 16:25 ` Leo Nikkilä via Bug reports for GNU Guix
2024-01-03 23:42 ` Ludovic Courtès
2024-01-05 16:32   ` Ludovic Courtès
2024-01-20 21:14 ` bug#64653: works now Matt Wette

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).