From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?P=C3=A1draig?= Brady Subject: bug#21459: Race condition in tests/tail-2/assert.sh Date: Fri, 11 Sep 2015 21:34:25 +0100 Message-ID: <55F33AD1.3080502@draigBrady.com> References: <87wpvw2ad8.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87wpvw2ad8.fsf@gnu.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Sender: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org To: Ludovic =?UTF-8?Q?Court=C3=A8s?= , 21459@debbugs.gnu.org Cc: bug-guix@gnu.org List-Id: bug-guix.gnu.org On 11/09/15 17:23, Ludovic Courtès wrote: > Hello, > > We have observed intermittent failures of tests/tail-2/assert.sh > (Coreutils 8.24, libc 2.22), especially showing up on relatively slow > machines (armhf and mips64el.) > > The failure is with ‘tail --follow=name’, which, in inotify mode, would > fail to report that file ‘foo’ has been deleted. > > The strace of a correct execution (x86_64) is like this: > > --8<---------------cut here---------------start------------->8--- > lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > inotify_init() = 5 > write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25 > inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1 > inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2 > inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1 > inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3 > open("a", O_RDONLY|O_NONBLOCK) = 6 > lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > fstat(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210, f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492, f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0 > close(6) = 0 > fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > open("foo", O_RDONLY|O_NONBLOCK) = 6 > lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstat(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210, f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492, f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0 > close(6) = 0 > fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16 > open("foo", O_RDONLY|O_NONBLOCK) = -1 ENOENT (No such file or directory) > lstat("foo", 0x7ffee174c3b0) = -1 ENOENT (No such file or directory) > write(2, "tail: ", 6) = 6 > write(2, "foo", 3) = 3 > write(2, ": No such file or directory", 27) = 27 > write(2, "\n", 1) = 1 > close(4) = 0 > read(5, "\3\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 20) = 16 > inotify_rm_watch(5, 3) = -1 EINVAL (Invalid argument) > --8<---------------cut here---------------end--------------->8--- > > For a *failing* execution (armhf), we get: > > --8<---------------cut here---------------start------------->8--- > lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > inotify_init() = 5 > write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25 > inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1 > inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2 > inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1 > inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3 > open("a", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6 > lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > fstat64(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0 > close(6) = 0 > fstat64(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0 > open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6 > lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0 > close(6) = 0 > fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16 > open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6 > lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178938, f_bavail=34274467, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0 > close(6) = 0 > read(5, > --8<---------------cut here---------------end--------------->8--- > > In both cases, reading from the inotify file descriptor (number 5) > returns a notification for watch #3 (corresponding to ‘foo’), with mask > IN_ATTRIB (value 4). > > However, the open("foo") call that immediately follows does *not* return > ENOENT in the failing case: The file is still there. > > The kernel’s ‘vfs_unlink’ goes like this: > > fsnotify_link_count(target); /* IN_ATTRIB */ > d_delete(dentry); /* fsnotify_nameremove → IN_DELETE for “.” */ > > So, ‘tail’ first receives the IN_ATTRIB notification corresponding to > the link count decrease on ‘foo’; at that point, ‘foo’ is still > available. And then, ‘tail’ should receive the IN_DELETE_SELF > notification, at which point ‘foo’ would have been actually been > unlinked. But in practice we don’t seem to be receiving IN_DELETE_SELF, > even in the succeeding case. > > I think the problem happens when ‘tail’ opens ‘foo’ right in between of > the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t > report anything. > > Does that make sense? Thanks for looking into it in such detail. IN_DELETE_SELF is only sent when the inode is removed, not on the unlink itself (as tail has the file open). One of the awkward gotchas with inotify being inode oriented. I.E. I wonder is this a cache coherency issue, with tail referencing the stale inode in the open() before the dcache is updated. It'll be a while before I can look at this. thanks again, Pádraig.