From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#21459: Race condition in tests/tail-2/assert.sh Date: Fri, 11 Sep 2015 18:23:31 +0200 Message-ID: <87wpvw2ad8.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Sender: "Debbugs-submit" Resent-Message-ID: Received: from eggs.gnu.org ([2001:4830:134:3::10]:37284) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZaR6w-0006TY-GR for bug-guix@gnu.org; Fri, 11 Sep 2015 12:23:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZaR6s-0004TC-Id for bug-guix@gnu.org; Fri, 11 Sep 2015 12:23:38 -0400 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Sender: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org To: 21459@debbugs.gnu.org Cc: bug-guix@gnu.org List-Id: bug-guix.gnu.org Hello, We have observed intermittent failures of tests/tail-2/assert.sh (Coreutils 8.24, libc 2.22), especially showing up on relatively slow machines (armhf and mips64el.) The failure is with =E2=80=98tail --follow=3Dname=E2=80=99, which, in inoti= fy mode, would fail to report that file =E2=80=98foo=E2=80=99 has been deleted. The strace of a correct execution (x86_64) is like this: --8<---------------cut here---------------start------------->8--- lstat("a", {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 lstat("foo", {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 inotify_init() =3D 5 write(1, "=3D=3D> a <=3D=3D\nx\n\n=3D=3D> foo <=3D=3D\n", 25) =3D 25 inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) =3D 1 inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = =3D 2 inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) =3D 1 inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF= ) =3D 3 open("a", O_RDONLY|O_NONBLOCK) =3D 6 lstat("a", {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 fstat(6, {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 fstatfs(6, {f_type=3D"EXT2_SUPER_MAGIC", f_bsize=3D4096, f_blocks=3D1673821= 0, f_bfree=3D1219533, f_bavail=3D367617, f_files=3D4259840, f_ffree=3D24064= 92, f_fsid=3D{1622537548, 1497272261}, f_namelen=3D255, f_frsize=3D4096}) = =3D 0 close(6) =3D 0 fstat(3, {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 open("foo", O_RDONLY|O_NONBLOCK) =3D 6 lstat("foo", {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstat(6, {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstatfs(6, {f_type=3D"EXT2_SUPER_MAGIC", f_bsize=3D4096, f_blocks=3D1673821= 0, f_bfree=3D1219533, f_bavail=3D367617, f_files=3D4259840, f_ffree=3D24064= 92, f_fsid=3D{1622537548, 1497272261}, f_namelen=3D255, f_frsize=3D4096}) = =3D 0 close(6) =3D 0 fstat(4, {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) =3D 16 open("foo", O_RDONLY|O_NONBLOCK) =3D -1 ENOENT (No such file or dire= ctory) lstat("foo", 0x7ffee174c3b0) =3D -1 ENOENT (No such file or dire= ctory) write(2, "tail: ", 6) =3D 6 write(2, "foo", 3) =3D 3 write(2, ": No such file or directory", 27) =3D 27 write(2, "\n", 1) =3D 1 close(4) =3D 0 read(5, "\3\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 20) =3D 16 inotify_rm_watch(5, 3) =3D -1 EINVAL (Invalid argument) --8<---------------cut here---------------end--------------->8--- For a *failing* execution (armhf), we get: --8<---------------cut here---------------start------------->8--- lstat64("a", {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 lstat64("foo", {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 inotify_init() =3D 5 write(1, "=3D=3D> a <=3D=3D\nx\n\n=3D=3D> foo <=3D=3D\n", 25) =3D 25 inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) =3D 1 inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = =3D 2 inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) =3D 1 inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF= ) =3D 3 open("a", O_RDONLY|O_NONBLOCK|O_LARGEFILE) =3D 6 lstat64("a", {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 fstat64(6, {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 fstatfs64(6, 88, {f_type=3D"EXT2_SUPER_MAGIC", f_bsize=3D4096, f_blocks=3D3= 7378337, f_bfree=3D36178942, f_bavail=3D34274471, f_files=3D9502720, f_ffre= e=3D9405759, f_fsid=3D{1592050960, 1812457140}, f_namelen=3D255, f_frsize= =3D4096, f_flags=3D4128}) =3D 0 close(6) =3D 0 fstat64(3, {st_mode=3DS_IFREG|0644, st_size=3D2, ...}) =3D 0 open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) =3D 6 lstat64("foo", {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstat64(6, {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstatfs64(6, 88, {f_type=3D"EXT2_SUPER_MAGIC", f_bsize=3D4096, f_blocks=3D3= 7378337, f_bfree=3D36178942, f_bavail=3D34274471, f_files=3D9502720, f_ffre= e=3D9405759, f_fsid=3D{1592050960, 1812457140}, f_namelen=3D255, f_frsize= =3D4096, f_flags=3D4128}) =3D 0 close(6) =3D 0 fstat64(4, {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) =3D 16 open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) =3D 6 lstat64("foo", {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstat64(6, {st_mode=3DS_IFREG|0644, st_size=3D0, ...}) =3D 0 fstatfs64(6, 88, {f_type=3D"EXT2_SUPER_MAGIC", f_bsize=3D4096, f_blocks=3D3= 7378337, f_bfree=3D36178938, f_bavail=3D34274467, f_files=3D9502720, f_ffre= e=3D9405759, f_fsid=3D{1592050960, 1812457140}, f_namelen=3D255, f_frsize= =3D4096, f_flags=3D4128}) =3D 0 close(6) =3D 0 read(5, --8<---------------cut here---------------end--------------->8--- In both cases, reading from the inotify file descriptor (number 5) returns a notification for watch #3 (corresponding to =E2=80=98foo=E2=80=99= ), with mask IN_ATTRIB (value 4). However, the open("foo") call that immediately follows does *not* return ENOENT in the failing case: The file is still there. The kernel=E2=80=99s =E2=80=98vfs_unlink=E2=80=99 goes like this: fsnotify_link_count(target); /* IN_ATTRIB */ d_delete(dentry); /* fsnotify_nameremove =E2=86=92 IN_DELETE for = =E2=80=9C.=E2=80=9D */ So, =E2=80=98tail=E2=80=99 first receives the IN_ATTRIB notification corres= ponding to the link count decrease on =E2=80=98foo=E2=80=99; at that point, =E2=80=98f= oo=E2=80=99 is still available. And then, =E2=80=98tail=E2=80=99 should receive the IN_DELETE_S= ELF notification, at which point =E2=80=98foo=E2=80=99 would have been actually= been unlinked. But in practice we don=E2=80=99t seem to be receiving IN_DELETE_= SELF, even in the succeeding case. I think the problem happens when =E2=80=98tail=E2=80=99 opens =E2=80=98foo= =E2=80=99 right in between of the two notifications: =E2=80=98foo=E2=80=99 is still there, and so =E2=80= =98tail=E2=80=99 doesn=E2=80=99t report anything. Does that make sense? Thanks, Ludo=E2=80=99.