From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#35550: Installer: wpa_supplicant fails to start Date: Fri, 03 May 2019 22:51:31 +0200 Message-ID: <875zqr8dnw.fsf@gnu.org> References: <87sgtv8hcz.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:50727) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hMfA6-0000MM-W2 for bug-guix@gnu.org; Fri, 03 May 2019 16:52:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hMfA5-0008F2-S3 for bug-guix@gnu.org; Fri, 03 May 2019 16:52:06 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:36217) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hMfA4-0008EJ-H8 for bug-guix@gnu.org; Fri, 03 May 2019 16:52:05 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hMfA1-0004nZ-Pj for bug-guix@gnu.org; Fri, 03 May 2019 16:52:04 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87sgtv8hcz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 03 May 2019 21:31:40 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: 35550@debbugs.gnu.org Cc: sirgazil Ludovic Court=C3=A8s skribis: > 678 May 3 16:03:40 localhost vmunix: [ 10.221211] shepherd[1]: Service= dbus-system has been started. > 679 May 3 16:03:40 localhost vmunix: [ 10.222093] shepherd[1]: Service= loopback has been started. > 680 May 3 16:03:40 localhost wpa_supplicant[398]: Successfully initializ= ed wpa_supplicant > 681 May 3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could n= ot be started. > 682 May 3 16:03:40 localhost shepherd[1]: Service networking depends on = wpa-supplicant. > 683 May 3 16:03:40 localhost shepherd[1]: Service networking could not b= e started. > 684 May 3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not reques= t service name: already registered > 685 May 3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize w= pa_supplicant > 686 May 3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could n= ot be started. My guess is that it goes like this: 1. shepherd starts =E2=80=98networking=E2=80=99, which triggers the start= of =E2=80=98wpa-supplicant=E2=80=99 (PID 398), which immediately =E2=80= =9Cfails=E2=80=9D. Thus =E2=80=98networking=E2=80=99 is not started. 2. shepherd continues and starts =E2=80=98wpa-supplicant=E2=80=99 directl= y (PID 400). This time it fails for good; after 5 seconds, since the PID file didn=E2=80=99t show up, shepherd says again that it could not be start= ed. Indeed, by looking at shepherd.conf from: guix gc -R $(guix system build gnu/system/install.scm) | grep shepherd.co= nf one can see that =E2=80=98networking=E2=80=99 comes before =E2=80=98wpa-sup= plicant=E2=80=99 in the expression: (for-each start '(=E2=80=A6 networking =E2=80=A6 wpa-supplicant =E2=80=A6= )) So why is =E2=80=98wpa-supplicant=E2=80=99 marked as failing to start on th= e first attempt? The only reason I can think of is if =E2=80=98read-pid-file=E2=80=99 from (= shepherd service) returns immediately and returns #f instead of a number. That can actually happen if the PID file exists but is empty (or contains garbage). You would expect wpa_supplicant to create its PID file atomically: write it under a different name, then rename(2)=E2=80=A6 but no: --8<---------------cut here---------------start------------->8--- int os_daemonize(const char *pid_file) { #if defined(__uClinux__) || defined(__sun__) return -1; #else /* defined(__uClinux__) || defined(__sun__) */ if (os_daemon(0, 0)) { perror("daemon"); return -1; } if (pid_file) { FILE *f =3D fopen(pid_file, "w"); if (f) { fprintf(f, "%u\n", getpid()); fclose(f); } } return -0; #endif /* defined(__uClinux__) || defined(__sun__) */ } --8<---------------cut here---------------end--------------->8--- So there is a possibility, albeit unlikely, for shepherd to see the PID file after it=E2=80=99s been open but before it=E2=80=99s been written to. = (This problem is not limited to the installer.) I=E2=80=99m not 100% convinced that this is what=E2=80=99s happening there = but that=E2=80=99s the only lead I have. I=E2=80=99m surprised we haven=E2=80=99t seen other = reports before. Thoughts? Ludo=E2=80=99.