From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?P=C3=A1draig?= Brady Subject: bug#21460: Race condition in tests/tail-2/assert.sh Date: Fri, 11 Sep 2015 23:49:58 +0100 Message-ID: <55F35A96.2090906@draigBrady.com> References: <87wpvw2ad8.fsf@gnu.org> <55F30CEC.7060102@cs.ucla.edu> <87a8ssad7e.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Sender: "Debbugs-submit" Resent-Message-ID: Received: from eggs.gnu.org ([2001:4830:134:3::10]:44293) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZaX8w-0007c6-UT for bug-guix@gnu.org; Fri, 11 Sep 2015 18:50:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZaX8r-0008NF-VP for bug-guix@gnu.org; Fri, 11 Sep 2015 18:50:06 -0400 In-Reply-To: <87a8ssad7e.fsf@gnu.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org Sender: bug-coreutils-bounces+gcgcb-bug-coreutils-616=gmane.org@gnu.org To: Ludovic =?UTF-8?Q?Court=C3=A8s?= , Paul Eggert Cc: 21460@debbugs.gnu.org, bug-guix@gnu.org List-Id: bug-guix.gnu.org On 11/09/15 21:55, Ludovic Courtès wrote: > Paul Eggert skribis: > >> Ludovic Courtès wrote: >>> I think the problem happens when ‘tail’ opens ‘foo’ right in between of >>> the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t >>> report anything. >>> >>> Does that make sense? >> >> Yes, though if the link count is indeed zero, I'm surprised that >> 'tail' can open the file -- that sounds like a bug in the kernel. > > Attached is a reproducer; just run it in a loop for a couple of seconds: > > --8<---------------cut here---------------start------------->8--- > $ while ./a.out ; do : ; done > funny, errno = Success, nlink = 0 > Aborted (core dumped) > --8<---------------cut here---------------end--------------->8--- > > I’m not sure if that’s a kernel bug. Strictly speaking, inotify works > as expected: we get a notification for nlink--, which doesn’t mean the > file has vanished. Interesting. It does seem that the IN_ATTRIB is sent before the st_nlink-- takes effect? That could be a bug. Or it could be a dcache coherency issue where the name still references the st_nlink==0 inode. Note recheck() just open() and close() the file in this case, but since it doesn't close() the original fd, then there will be no IN_DELETE_SELF event. If the above kernel behavior can be explained and is acceptable, I suppose we could augment recheck() with something like: diff --git a/src/tail.c b/src/tail.c index f916d74..e9d5337 100644 --- a/src/tail.c +++ b/src/tail.c @@ -1046,6 +1046,18 @@ recheck (struct File_spec *f, bool blocking) close_fd (f->fd, pretty_name (f)); } + else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */ + { + /* It was seen on Linux that a file could be opened + even though unlinked as the directory entry (cache) + is updated after the IN_ATTRIB is sent for the nlink--. */ + + error (0, f->errnum, _("%s has become inaccessible"), + quote (pretty_name (f))); + + close_fd (fd, pretty_name (f)); + close_fd (f->fd, pretty_name (f)); + f->fd = -1; else { /* No changes detected, so close new fd. */ > The conclusion for ‘tail’ would be to wait for the IN_DELETE_SELF event > before considering the file to be gone. WDYT? As mentioned above, tail references the file until it can't open it, so the IN_DELETE_SELF is only generated upon the close_fd(f->fd) above. thanks, Pádraig.