all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#35550: Installer: wpa_supplicant fails to start
@ 2019-05-03 19:31 Ludovic Courtès
  2019-05-03 20:51 ` Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-03 19:31 UTC (permalink / raw)
  To: 35550; +Cc: sirgazil

Hello,

Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would
(sometimes?) fail to start for the in the installation image on their
laptop.

As a result the ‘networking’ service (Connman) would not start either;
thus, the installer would report that it failed to establish Internet
access, and would just stop there.

Sirgazil posted /var/log/messages, which shows this:

--8<---------------cut here---------------start------------->8---
667 May  3 16:03:40 localhost vmunix: [    9.164363] ieee80211 phy0: Atheros AR9565 Rev:2 mem=0xffffa88240e80000, irq=19
668 May  3 16:03:40 localhost vmunix: [    9.172514] ath9k 0000:02:00.0 wlp2s0: renamed from wlan0
669 May  3 16:03:40 localhost vmunix: [    9.199748] shepherd[1]: Service udev has been started.
670 May  3 16:03:40 localhost vmunix: [    9.200682] shepherd[1]: Service file-system-/tmp has been started.
671 May  3 16:03:40 localhost vmunix: [    9.201560] shepherd[1]: Service file-system-/dev/pts has been started.
672 May  3 16:03:40 localhost vmunix: [    9.202423] shepherd[1]: Service file-system-/dev/shm has been started.
673 May  3 16:03:40 localhost vmunix: [    9.203232] shepherd[1]: Service file-system-/gnu/store has been started.
674 May  3 16:03:40 localhost vmunix: [    9.203891] shepherd[1]: Service file-systems has been started.
675 May  3 16:03:40 localhost vmunix: [    9.207605] shepherd[1]: Service user-processes has been started.
676 May  3 16:03:40 localhost vmunix: [    9.208993] shepherd[1]: Service host-name has been started.
677 May  3 16:03:40 localhost vmunix: [    9.217925] shepherd[1]: Service user-homes has been started.
678 May  3 16:03:40 localhost vmunix: [   10.221211] shepherd[1]: Service dbus-system has been started.
679 May  3 16:03:40 localhost vmunix: [   10.222093] shepherd[1]: Service loopback has been started.
680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.
687 May  3 16:03:45 localhost nscd: 401 monitoring file `/etc/hosts` (1)
688 May  3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)
689 May  3 16:03:45 localhost nscd: 401 monitoring file `/etc/resolv.conf` (3)
690 May  3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)
691 May  3 16:03:46 localhost shepherd[1]: Service nscd has been started.
692 May  3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: *** info [daemon/startup.c(136)]:
693 May  3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: Started gpm successfully. Entered daemon mode.
694 May  3 16:03:46 localhost shepherd[1]: Service gpm has been started.
695 May  3 16:03:46 localhost shepherd[1]: Service term-tty1 has been started.
696 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty1 has been started.
697 May  3 16:03:46 localhost shepherd[1]: Service virtual-terminal has been started.
698 May  3 16:03:46 localhost shepherd[1]: Service term-tty2 has been started.
699 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty2 has been started.
700 May  3 16:03:46 localhost shepherd[1]: Service term-tty3 has been started.
701 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty3 has been started.
702 May  3 16:03:46 localhost shepherd[1]: Service term-tty4 has been started.
703 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty4 has been started.
704 May  3 16:03:46 localhost shepherd[1]: Service term-tty5 has been started.
705 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty5 has been started.
706 May  3 16:03:46 localhost shepherd[1]: Service term-tty6 has been started.
707 May  3 16:03:46 localhost shepherd[1]: Service console-font-tty6 has been started.
708 May  3 16:03:46 localhost shepherd[1]: Service guix-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 19:31 bug#35550: Installer: wpa_supplicant fails to start Ludovic Courtès
@ 2019-05-03 20:51 ` Ludovic Courtès
  2019-05-04 15:28   ` Ludovic Courtès
                     ` (2 more replies)
  2019-05-03 21:26 ` Ricardo Wurmus
  2019-05-04  3:26 ` sirgazil
  2 siblings, 3 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-03 20:51 UTC (permalink / raw)
  To: 35550; +Cc: sirgazil

Ludovic Courtès <ludo@gnu.org> skribis:

> 678 May  3 16:03:40 localhost vmunix: [   10.221211] shepherd[1]: Service dbus-system has been started.
> 679 May  3 16:03:40 localhost vmunix: [   10.222093] shepherd[1]: Service loopback has been started.
> 680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
> 681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
> 682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
> 683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
> 684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
> 685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
> 686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.

My guess is that it goes like this:

  1. shepherd starts ‘networking’, which triggers the start of
     ‘wpa-supplicant’ (PID 398), which immediately “fails”.  Thus
     ‘networking’ is not started.

  2. shepherd continues and starts ‘wpa-supplicant’ directly (PID 400).
     This time it fails for good; after 5 seconds, since the PID file
     didn’t show up, shepherd says again that it could not be started.

Indeed, by looking at shepherd.conf from:

  guix gc -R $(guix system build gnu/system/install.scm) | grep shepherd.conf

one can see that ‘networking’ comes before ‘wpa-supplicant’ in the expression:

  (for-each start '(… networking … wpa-supplicant …))

So why is ‘wpa-supplicant’ marked as failing to start on the first
attempt?

The only reason I can think of is if ‘read-pid-file’ from (shepherd
service) returns immediately and returns #f instead of a number.  That
can actually happen if the PID file exists but is empty (or contains
garbage).

You would expect wpa_supplicant to create its PID file atomically: write
it under a different name, then rename(2)… but no:

--8<---------------cut here---------------start------------->8---
int os_daemonize(const char *pid_file)
{
#if defined(__uClinux__) || defined(__sun__)
	return -1;
#else /* defined(__uClinux__) || defined(__sun__) */
	if (os_daemon(0, 0)) {
		perror("daemon");
		return -1;
	}

	if (pid_file) {
		FILE *f = fopen(pid_file, "w");
		if (f) {
			fprintf(f, "%u\n", getpid());
			fclose(f);
		}
	}

	return -0;
#endif /* defined(__uClinux__) || defined(__sun__) */
}
--8<---------------cut here---------------end--------------->8---

So there is a possibility, albeit unlikely, for shepherd to see the PID
file after it’s been open but before it’s been written to.  (This
problem is not limited to the installer.)

I’m not 100% convinced that this is what’s happening there but that’s
the only lead I have.  I’m surprised we haven’t seen other reports
before.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 19:31 bug#35550: Installer: wpa_supplicant fails to start Ludovic Courtès
  2019-05-03 20:51 ` Ludovic Courtès
@ 2019-05-03 21:26 ` Ricardo Wurmus
  2019-05-04  3:26 ` sirgazil
  2 siblings, 0 replies; 23+ messages in thread
From: Ricardo Wurmus @ 2019-05-03 21:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550


Ludovic Courtès <ludo@gnu.org> writes:

> Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would
> (sometimes?) fail to start for the in the installation image on their
> laptop.

What I observed was that the wpa-supplicant *service* would not start
and starting it manually would fail because a wpa_supplicant *process*
would already be running.

Upon killing the stray process the service could be started.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 19:31 bug#35550: Installer: wpa_supplicant fails to start Ludovic Courtès
  2019-05-03 20:51 ` Ludovic Courtès
  2019-05-03 21:26 ` Ricardo Wurmus
@ 2019-05-04  3:26 ` sirgazil
  2019-05-04 12:49   ` Ludovic Courtès
  2 siblings, 1 reply; 23+ messages in thread
From: sirgazil @ 2019-05-04  3:26 UTC (permalink / raw)
  To: 35550

[-- Attachment #1: Type: text/plain, Size: 317 bytes --]

In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).

Killing "wpa-supplicant" and then running "herd start networking" made it possible for the graphical installer to display the steps where you select Wireless or Wired, and then continue normally.

[-- Attachment #2: Type: text/html, Size: 731 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04  3:26 ` sirgazil
@ 2019-05-04 12:49   ` Ludovic Courtès
  2019-05-04 14:07     ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-04 12:49 UTC (permalink / raw)
  To: sirgazil; +Cc: 35550

Hi,

sirgazil@zoho.com skribis:

> In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).

Could you check whether the sequence of events in /var/log/messages
always looks like the snippet I showed?

Could you also:

  1. Let the installation image boot;

  2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
     service is marked as stopped (failed to start);

  3. Show the contents /var/run/wpa_supplicant.pid, and check whether
     they correspond to the PID of a running ‘wpa_supplicant’ process.

?

Besides, do you have a Guix(SD) installation, recent or not, that uses
wpa_supplicant?  If so, does it suffer from the same problem
(‘wpa-supplicant’ service fails to start)?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 12:49   ` Ludovic Courtès
@ 2019-05-04 14:07     ` pelzflorian (Florian Pelz)
  2019-05-04 15:32       ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: pelzflorian (Florian Pelz) @ 2019-05-04 14:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

On Sat, May 04, 2019 at 02:49:55PM +0200, Ludovic Courtès wrote:
> Hi,
> 
> sirgazil@zoho.com skribis:
> 
> > In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).
> 
> Could you check whether the sequence of events in /var/log/messages
> always looks like the snippet I showed?
> 

So for me it sometimes fails on two different machines.  Actually it
started failing four times in a row now that I test this.

When it fails, I also get

On Fri, May 03, 2019 at 09:31:40PM +0200, Ludovic Courtès wrote:
> 680 May  3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant
> 681 May  3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.
> 682 May  3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.
> 683 May  3 16:03:40 localhost shepherd[1]: Service networking could not be started.
> 684 May  3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered
> 685 May  3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant
> 686 May  3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.



> Could you also:
> 
>   1. Let the installation image boot;
> 
>   2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
>      service is marked as stopped (failed to start);
>

$ herd status wpa-supplicant
Status of wpa-supplicant:
  It is stopped.
  It is enabled.
  Provides (wpa-supplicant).
  Requires (user-processes dbus-system loopback syslogd).
  Conflicts with ().
  Will be respawned.



>   3. Show the contents /var/run/wpa_supplicant.pid, and check whether
>      they correspond to the PID of a running ‘wpa_supplicant’ process.
>

There is no such file.

Regards,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 20:51 ` Ludovic Courtès
@ 2019-05-04 15:28   ` Ludovic Courtès
  2019-05-05 20:26   ` Ludovic Courtès
  2019-05-05 22:21   ` Ludovic Courtès
  2 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-04 15:28 UTC (permalink / raw)
  To: 35550; +Cc: sirgazil

Ludovic Courtès <ludo@gnu.org> skribis:

> You would expect wpa_supplicant to create its PID file atomically: write
> it under a different name, then rename(2)… but no:

Shepherd commit 72631752149d000c8c98ae0cc66e0b0c2eda94ef changes
‘read-pid-file’ to better deal with this.

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 14:07     ` pelzflorian (Florian Pelz)
@ 2019-05-04 15:32       ` Ludovic Courtès
  2019-05-04 15:47         ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-04 15:32 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: sirgazil, 35550

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

>> Could you also:
>> 
>>   1. Let the installation image boot;
>> 
>>   2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’
>>      service is marked as stopped (failed to start);
>>
>
> $ herd status wpa-supplicant
> Status of wpa-supplicant:
>   It is stopped.
>   It is enabled.
>   Provides (wpa-supplicant).
>   Requires (user-processes dbus-system loopback syslogd).
>   Conflicts with ().
>   Will be respawned.
>
>
>
>>   3. Show the contents /var/run/wpa_supplicant.pid, and check whether
>>      they correspond to the PID of a running ‘wpa_supplicant’ process.
>>
>
> There is no such file.

At this point, could you also check:

  3b. The output of “pgrep -fa wpa_supplicant”.

The PID file was probably removed by the second attempt to start the
‘wpa-supplicant’ service.

Could you also try running (as root):

  while herd restart wpa-supplicant; do : ; done

to see the frequency at which the service fails to start?

Thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 15:32       ` Ludovic Courtès
@ 2019-05-04 15:47         ` pelzflorian (Florian Pelz)
  2019-05-04 21:14           ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: pelzflorian (Florian Pelz) @ 2019-05-04 15:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
> At this point, could you also check:
> 
>   3b. The output of “pgrep -fa wpa_supplicant”.
> 

root@gnu ~# pgrep -fa wpa_supplicant
481 /gnu/store/ifqy2dr2hbqplv14k50dzprw3k2bz9m9-wpa-supplicant-2.7/sbin/wpa_supplicant -P/var/run/wpa_supplicant.pid -B -s -u


> The PID file was probably removed by the second attempt to start the
> ‘wpa-supplicant’ service.
> 
> Could you also try running (as root):
> 
>   while herd restart wpa-supplicant; do : ; done
> 
> to see the frequency at which the service fails to start?
>

It displays

Service wpa-supplicant is not running.
Service wpa-supplicant could not be started.

after and again and again every approximately 5 seconds.

Regards,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 15:47         ` pelzflorian (Florian Pelz)
@ 2019-05-04 21:14           ` Ludovic Courtès
  2019-05-04 21:34             ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-04 21:14 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: sirgazil, 35550

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:

[...]

>> Could you also try running (as root):
>> 
>>   while herd restart wpa-supplicant; do : ; done
>> 
>> to see the frequency at which the service fails to start?
>>
>
> It displays
>
> Service wpa-supplicant is not running.
> Service wpa-supplicant could not be started.

Sorry, to be complete, I should have asked you to first kill the
“dangling” wpa_supplicant process:

  pkill -f wpa_supplicant
  pgrep -fa wpa_supplicant  # make sure it’s really dead
  while herd restart wpa-supplicant; do : ; done

Could you try this?

TIA,
Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 21:14           ` Ludovic Courtès
@ 2019-05-04 21:34             ` pelzflorian (Florian Pelz)
  2019-05-05 16:29               ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: pelzflorian (Florian Pelz) @ 2019-05-04 21:34 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

On Sat, May 04, 2019 at 11:14:29PM +0200, Ludovic Courtès wrote:
> Hi,
> 
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> 
> > On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
> 
> [...]
> 
> >> Could you also try running (as root):
> >> 
> >>   while herd restart wpa-supplicant; do : ; done
> >> 
> >> to see the frequency at which the service fails to start?
> >>
> >
> > It displays
> >
> > Service wpa-supplicant is not running.
> > Service wpa-supplicant could not be started.
> 
> Sorry, to be complete, I should have asked you to first kill the
> “dangling” wpa_supplicant process:
> 
>   pkill -f wpa_supplicant
>   pgrep -fa wpa_supplicant  # make sure it’s really dead

pgrep does not display anything.


>   while herd restart wpa-supplicant; do : ; done
> 
> Could you try this?
> 

Now it displays

Service wpa-supplicant has been stopped.
Service wpa-supplicant has been started.

again and again incessantly, one immediately after the other with no
time in between.

By the way, the network connection only failed sometimes when I tested
Guix 1.0.0, but since I started using a Guix disk image from a more
recent Guix, wpa-supplicant apparently always fails.  Strange.

Regards,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-04 21:34             ` pelzflorian (Florian Pelz)
@ 2019-05-05 16:29               ` Ludovic Courtès
  0 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-05 16:29 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: sirgazil, 35550

Hello,

I’ve been able to test it on the bare metal as well and I confirm: I
booted the install image a few times and the ‘wpa-supplicant’ service
would be stopped every time, with a dangling ‘wpa_supplicant’ process
running.  Given that this seems to be systematic, I’m not sure my PID
file atomicity hypothesis explains everything.

I’ve tried running:

  pkill wpa_supplicant
  while herd start wpa-supplicant ; do herd stop wpa-supplicant ; done

and that always works (i.e., no “failed to start ‘wpa-supplicant’”
error.)

If I run in parallel this:

  while touch /var/run/wpa_supplicant.pid ; do : ; done

then it occasionally “fails to start”, but not very often.

To be continued…

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 20:51 ` Ludovic Courtès
  2019-05-04 15:28   ` Ludovic Courtès
@ 2019-05-05 20:26   ` Ludovic Courtès
  2019-05-05 22:21   ` Ludovic Courtès
  2 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-05 20:26 UTC (permalink / raw)
  To: 35550; +Cc: sirgazil

Ludovic Courtès <ludo@gnu.org> skribis:

> You would expect wpa_supplicant to create its PID file atomically: write
> it under a different name, then rename(2)… but no:

Regarding this PID file atomicity issue, a few other data points: Tor,
syslogd (GNU Inetutils), sshd (OpenSSH), and dbus-daemon all do the
exact same non-atomic PID file creation as wpa_supplicant.

Libdaemon (used by Avahi) does something more fancy: it acquires an
exclusive write lock (fcntl with F_SETLKW) on the PID file right after
opening it, but since shepherd does not attempt to acquire locks on PID
file, it has no effect.

systemd and GNU Pies don’t take any precautions while reading PID
files.  The assumption seems to be that if the file exists, then it must
contain a valid PID.

So perhaps the problem we see with ‘wpa_supplicant’ is elsewhere.

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-03 20:51 ` Ludovic Courtès
  2019-05-04 15:28   ` Ludovic Courtès
  2019-05-05 20:26   ` Ludovic Courtès
@ 2019-05-05 22:21   ` Ludovic Courtès
  2019-05-06 19:47     ` pelzflorian (Florian Pelz)
  2019-05-06 20:00     ` Danny Milosavljevic
  2 siblings, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-05 22:21 UTC (permalink / raw)
  To: 35550; +Cc: sirgazil

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> So why is ‘wpa-supplicant’ marked as failing to start on the first
> attempt?
>
> The only reason I can think of is if ‘read-pid-file’ from (shepherd
> service) returns immediately and returns #f instead of a number.  That
> can actually happen if the PID file exists but is empty (or contains
> garbage).

I’ve produced an ISO with the patch below and ran it on the bare metal
to get confirmation (too bad the bug doesn’t show in QEMU :-/).  Indeed,
‘read-pid-file’ for /var/run/wpa_supplicant.pid systematically reads the
empty string the first time the ‘wpa-supplicant’ service is started.

(After that, if we kill the process and try to restart the service, the
problem doesn’t show up.)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 734 bytes --]

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 53437b6..e21492e 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -717,9 +717,12 @@ otherwise return the number that was read (a PID)."
   (let loop ()
     (catch 'system-error
       (lambda ()
-        (string->number
-         (string-trim-both
-          (call-with-input-file file get-string-all))))
+        (define str
+          (call-with-input-file file get-string-all))
+
+        (local-output (l10n "read-pid-file ~s -> ~s")
+                      file str)
+        (string->number (string-trim-both str)))
       (lambda args
         (let ((errno (system-error-errno args)))
           (if (= ENOENT errno)

[-- Attachment #3: Type: text/plain, Size: 626 bytes --]


With the second patch below, I confirm that the ‘wpa-supplicant’ starts
correctly.  We can see in /var/log/messages that ‘read-pid-file’ first
reads the empty string from /var/run/wpa_supplicant.pid, then tries
again, and gets a valid PID on the second attempt.

It’s surprising that the timing is always like that, and only on the
bare metal, but that’s the way it is.

It’d be great if you could do some testing with the patch below.  Then I
guess we’ll push a Shepherd release with this fix.

I wonder if this could also explain
<https://issues.guix.info/issue/30993>.

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Type: text/x-patch, Size: 2540 bytes --]

diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index dfc3467bf8..e1dd248679 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -188,7 +188,8 @@ and provides a \"top-like\" mode (monitoring).")
                                   version ".tar.gz"))
               (sha256
                (base32
-                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))
+                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))
+              (patches (search-patches "shepherd-debug.patch"))))
     (build-system gnu-build-system)
     (arguments
      '(#:configure-flags '("--localstatedir=/var")))
diff --git a/gnu/packages/patches/shepherd-debug.patch b/gnu/packages/patches/shepherd-debug.patch
new file mode 100644
index 0000000000..2fd97cc578
--- /dev/null
+++ b/gnu/packages/patches/shepherd-debug.patch
@@ -0,0 +1,43 @@
+diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
+index 53437b6..bef8f42 100644
+--- a/modules/shepherd/service.scm
++++ b/modules/shepherd/service.scm
+@@ -715,21 +715,28 @@ number.  Return #f if FILE was not created or does not contain a number;
+ otherwise return the number that was read (a PID)."
+   (define start (current-time))
+   (let loop ()
++    (define (retry)
++      (and (< (current-time) (+ start max-delay))
++           (begin
++             ;; FILE does not exist yet, so wait and try again.
++             ;; XXX: Ideally we would yield to the main event loop
++             ;; and/or use inotify.
++             (sleep 1)
++             (loop))))
++
+     (catch 'system-error
+       (lambda ()
+-        (string->number
+-         (string-trim-both
+-          (call-with-input-file file get-string-all))))
++        (define str
++          (call-with-input-file file get-string-all))
++
++        (local-output (l10n "read-pid-file ~s -> ~s")
++                      file str)
++        (or (string->number (string-trim-both str))
++            (retry)))
+       (lambda args
+         (let ((errno (system-error-errno args)))
+           (if (= ENOENT errno)
+-              (and (< (current-time) (+ start max-delay))
+-                   (begin
+-                     ;; FILE does not exist yet, so wait and try again.
+-                     ;; XXX: Ideally we would yield to the main event loop
+-                     ;; and/or use inotify.
+-                     (sleep 1)
+-                     (loop)))
++              (retry)
+               (apply throw args)))))))
+ 
+ (define* (exec-command command

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-05 22:21   ` Ludovic Courtès
@ 2019-05-06 19:47     ` pelzflorian (Florian Pelz)
  2019-05-07  7:53       ` Ludovic Courtès
  2019-05-06 20:00     ` Danny Milosavljevic
  1 sibling, 1 reply; 23+ messages in thread
From: pelzflorian (Florian Pelz) @ 2019-05-06 19:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:
> It’d be great if you could do some testing with the patch below.  Then I
> guess we’ll push a Shepherd release with this fix.
> 

The second patch (the one for Guix is the only one I have tested)
works fine on the system which always failed before since sometime
after 1.0.

One time wpa-supplicant worked fine but it had an error establishing
the internet connection, but I am quite certain it was an unrelated
hardware issue.

Regards,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-05 22:21   ` Ludovic Courtès
  2019-05-06 19:47     ` pelzflorian (Florian Pelz)
@ 2019-05-06 20:00     ` Danny Milosavljevic
  2019-05-07  8:05       ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Danny Milosavljevic @ 2019-05-06 20:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

[-- Attachment #1: Type: text/plain, Size: 129 bytes --]

Hi Ludo,

what happens when the loop reads the pid file when it contains just half of a
numeral?  It won't detect it, right?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-06 19:47     ` pelzflorian (Florian Pelz)
@ 2019-05-07  7:53       ` Ludovic Courtès
  0 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-07  7:53 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: sirgazil, 35550

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:
>> It’d be great if you could do some testing with the patch below.  Then I
>> guess we’ll push a Shepherd release with this fix.
>> 
>
> The second patch (the one for Guix is the only one I have tested)
> works fine on the system which always failed before since sometime
> after 1.0.
>
> One time wpa-supplicant worked fine but it had an error establishing
> the internet connection, but I am quite certain it was an unrelated
> hardware issue.

Awesome, thanks for testing.

Note that I made similar changes in shepherd, which I’ll probably
release as 0.6.1 so we can use it in Guix 1.0.1:

  https://git.savannah.gnu.org/cgit/shepherd.git/log

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-06 20:00     ` Danny Milosavljevic
@ 2019-05-07  8:05       ` Ludovic Courtès
  2019-05-08 14:32         ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-07  8:05 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: sirgazil, 35550

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

Hi Danny,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

> what happens when the loop reads the pid file when it contains just half of a
> numeral?  It won't detect it, right?

Correct.

I’m proposing the addition below to be on the verrrry safe side.
WDYT?

The weird thing, as I mentioned earlier, is that systemd and Pies do not
protect against truncated PID files, and I couldn’t find any
“documentation” of the problem on the intertubes.  For systemd it’s
maybe less of a problem since services are started in a cgroup, I think.

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 2388 bytes --]

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index b937609..ef27d88 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -709,10 +709,16 @@ results."
 set when starting a service."
   (environ))
 
-(define* (read-pid-file file #:key (max-delay 5))
+(define* (read-pid-file file #:key (max-delay 5)
+                        (validate-pid? #f))
   "Wait for MAX-DELAY seconds for FILE to show up, and read its content as a
 number.  Return #f if FILE was not created or does not contain a number;
-otherwise return the number that was read (a PID)."
+otherwise return the number that was read (a PID).
+
+When VALIDATE-PID? is true, succeed if and only if the number that was read is
+the PID of an existing process in the current PID namespace.  This test cannot
+be used if FILE might contain a PID from another PID namespace (i.e., the
+daemon writing FILE is running in a separate PID namespace.)"
   (define start (current-time))
 
   (let loop ()
@@ -736,11 +742,13 @@ otherwise return the number that was read (a PID)."
            (try-again))
           ((? integer? pid)
            ;; It's possible, though unlikely, that PID is not a valid PID, for
-           ;; instance because writes to FILE did not complete.  However, we
-           ;; don't do (kill pid 0) because if the process lives in a separate
-           ;; PID namespace, then PID is probably invalid in our own
-           ;; namespace.
-           pid)))
+           ;; instance because writes to FILE did not complete.  When
+           ;; VALIDATE-PID? is true, check that PID is valid in the current
+           ;; PID namespace.
+           (if (or (not validate-pid?)
+                   (catch-system-error (kill pid 0) #t))
+               pid
+               (try-again)))))
       (lambda args
         (let ((errno (system-error-errno args)))
           (if (= ENOENT errno)
@@ -931,7 +939,8 @@ start."
                                         environment-variables)))
             (if pid-file
                 (match (read-pid-file pid-file
-                                      #:max-delay pid-file-timeout)
+                                      #:max-delay pid-file-timeout
+                                      #:validate-pid? #t)
                   (#f
                    (catch-system-error (kill pid SIGTERM))
                    #f)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-07  8:05       ` Ludovic Courtès
@ 2019-05-08 14:32         ` Ludovic Courtès
  2019-05-08 22:25           ` Ludovic Courtès
  2019-05-11 18:13           ` Ludovic Courtès
  0 siblings, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-08 14:32 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: sirgazil, 35550

Ludovic Courtès <ludo@gnu.org> skribis:

> I’m proposing the addition below to be on the verrrry safe side.

Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.

Unless something goes wrong, I’ll make Shepherd commit
fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
days (well, with an updated NEWS file.)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-08 14:32         ` Ludovic Courtès
@ 2019-05-08 22:25           ` Ludovic Courtès
  2019-05-09  6:46             ` pelzflorian (Florian Pelz)
  2019-05-11 18:13           ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-08 22:25 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: sirgazil, 35550

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I’m proposing the addition below to be on the verrrry safe side.
>
> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.
>
> Unless something goes wrong, I’ll make Shepherd commit
> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
> days (well, with an updated NEWS file.)

If you want to re-test that wpa_supplicant does start correctly with the
latest Shepherd, you can do so by applying the patch below to Guix.

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-org, Size: 1606 bytes --]

diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scm
index dfc3467bf8..6cabea0fb9 100644
--- a/gnu/packages/admin.scm
+++ b/gnu/packages/admin.scm
@@ -181,22 +181,30 @@ and provides a \"top-like\" mode (monitoring).")
 (define-public shepherd
   (package
     (name "shepherd")
-    (version "0.6.0")
+    (version "0.6.1pre1")
     (source (origin
-              (method url-fetch)
-              (uri (string-append "mirror://gnu/shepherd/shepherd-"
-                                  version ".tar.gz"))
+              (method git-fetch)
+              (uri (git-reference
+                    (url "https://git.savannah.gnu.org/git/shepherd.git")
+                    (commit "fbb9c3fac745552eaf0f354bd0134cca7027bf17")))
               (sha256
                (base32
-                "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))
+                "0w7dx2i04qgykzhz4qir3rxnrdfza3q7kzh5z4mpmgjrp08pfnrn"))))
     (build-system gnu-build-system)
     (arguments
-     '(#:configure-flags '("--localstatedir=/var")))
+     '(#:configure-flags '("--localstatedir=/var"
+                           "ac_cv_path_MSGMERGE=true")))
     (native-inputs
      `(("pkg-config" ,pkg-config)
 
        ;; This is the Guile we use as a cross-compiler...
-       ("guile" ,guile-2.2)))
+       ("guile" ,guile-2.2)
+
+       ("texinfo" ,texinfo)
+       ("help2man" ,help2man)
+       ("gettext" ,gnu-gettext)
+       ("autoconf" ,autoconf)
+       ("automake" ,automake)))
     (inputs
      ;; ... and this is the one that appears in shebangs when cross-compiling.
      `(("guile" ,guile-2.2)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-08 22:25           ` Ludovic Courtès
@ 2019-05-09  6:46             ` pelzflorian (Florian Pelz)
  2019-05-09 10:18               ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: pelzflorian (Florian Pelz) @ 2019-05-09  6:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: sirgazil, 35550

On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Courtès wrote:
> If you want to re-test that wpa_supplicant does start correctly with the
> latest Shepherd, you can do so by applying the patch below to Guix.
> 

Works fine too on Macbook and PC (using a non-ISO disk image).

Regards,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-09  6:46             ` pelzflorian (Florian Pelz)
@ 2019-05-09 10:18               ` Ludovic Courtès
  0 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-09 10:18 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: sirgazil, 35550

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Courtès wrote:
>> If you want to re-test that wpa_supplicant does start correctly with the
>> latest Shepherd, you can do so by applying the patch below to Guix.
>> 
>
> Works fine too on Macbook and PC (using a non-ISO disk image).

Great, thanks for testing!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#35550: Installer: wpa_supplicant fails to start
  2019-05-08 14:32         ` Ludovic Courtès
  2019-05-08 22:25           ` Ludovic Courtès
@ 2019-05-11 18:13           ` Ludovic Courtès
  1 sibling, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2019-05-11 18:13 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: sirgazil, 35550-done

Ludovic Courtès <ludo@gnu.org> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I’m proposing the addition below to be on the verrrry safe side.
>
> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.
>
> Unless something goes wrong, I’ll make Shepherd commit
> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming
> days (well, with an updated NEWS file.)

Commit 3b8699f9c2a9f88d44a5320398b4c968c432429d upgrades the Shepherd to
0.6.1, which fixes the PID file race condition we’ve been talking about.

Closing!

Ludo’.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-05-11 18:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-03 19:31 bug#35550: Installer: wpa_supplicant fails to start Ludovic Courtès
2019-05-03 20:51 ` Ludovic Courtès
2019-05-04 15:28   ` Ludovic Courtès
2019-05-05 20:26   ` Ludovic Courtès
2019-05-05 22:21   ` Ludovic Courtès
2019-05-06 19:47     ` pelzflorian (Florian Pelz)
2019-05-07  7:53       ` Ludovic Courtès
2019-05-06 20:00     ` Danny Milosavljevic
2019-05-07  8:05       ` Ludovic Courtès
2019-05-08 14:32         ` Ludovic Courtès
2019-05-08 22:25           ` Ludovic Courtès
2019-05-09  6:46             ` pelzflorian (Florian Pelz)
2019-05-09 10:18               ` Ludovic Courtès
2019-05-11 18:13           ` Ludovic Courtès
2019-05-03 21:26 ` Ricardo Wurmus
2019-05-04  3:26 ` sirgazil
2019-05-04 12:49   ` Ludovic Courtès
2019-05-04 14:07     ` pelzflorian (Florian Pelz)
2019-05-04 15:32       ` Ludovic Courtès
2019-05-04 15:47         ` pelzflorian (Florian Pelz)
2019-05-04 21:14           ` Ludovic Courtès
2019-05-04 21:34             ` pelzflorian (Florian Pelz)
2019-05-05 16:29               ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.