From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS Date: Mon, 11 Jun 2018 11:04:47 -0400 Message-ID: References: <1727545582523435cab149c2bc857b40@alexander.shukaev.name> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1528729421 28018 195.159.176.226 (11 Jun 2018 15:03:41 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 11 Jun 2018 15:03:41 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jun 11 17:03:37 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fSOM3-00079h-TL for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 17:03:36 +0200 Original-Received: from localhost ([::1]:49405 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSOO9-0007tP-CD for ged-emacs-devel@m.gmane.org; Mon, 11 Jun 2018 11:05:45 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36101) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSONT-0007tH-2g for emacs-devel@gnu.org; Mon, 11 Jun 2018 11:05:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fSONN-0004aI-7a for emacs-devel@gnu.org; Mon, 11 Jun 2018 11:05:03 -0400 Original-Received: from [195.159.176.226] (port=52671 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fSONM-0004Zu-VP for emacs-devel@gnu.org; Mon, 11 Jun 2018 11:04:57 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1fSOLD-0006Ba-FX for emacs-devel@gnu.org; Mon, 11 Jun 2018 17:02:43 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 60 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:17+RXb6/mwrHhHkIEPYkctAP804= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:226214 Archived-At: > this discussion. I still find this issue very disruptive. Yet another > example would be `recentf-cleanup' which is in my case triggered on Emacs > start up, when the file comes from stale NFS, the corresponding > `file-readable-p' down the stack will hang indefinitely, and there would be > no way to unfreeze it apart from issuing 'kill -9' to that Emacs instance. Indeed stale NFS mounts can be problematic. As you can see from Andreas's reaction the obvious first answer is that it's a general problem, so I think we first need to understand what makes it different in the context of Emacs. I don't use NFS much these days, but IIRC there are basically two different ways to do NFS mounts: "hard" and "soft". Back when I used it, "hard" was used with "intr" so you could interrupt frozen processes, but from what I read, the linux kernel's NFS client nowadays doesn't support this any more, so a process waiting for a hard-mounted NFS server can only be interrupted with a SIGKILL. So some questions, to better understand what are our options: - It seems your unreliable NFS server is mounted "hard" rather than "soft". Why is that? "man mount" on my Debian machine doesn't find any "hard" or "soft" options, so has the soft-mount option disappeared? What are applications usually expected to do when accessing a stale NFS server? - You can "kill -9" is the only option, yet you seem to also say that SIGALRM does work. The two statements seem contradictory. What is the set of signals which work, really? E.g. Does `kill -USR1` work (with debug-on-event)? Maybe the issue here is that Emacs handles C-g via polling rather than via interrupts, and we should refine that polling such that it handles such "C-g while in the middle of a long-running file access syscall"? > Well, enough rant. I think I have a proposal how to fix the issue, even > given the blocking nature of Emacs. How about introducing a variable > `file-access-timeout' defaulting to `nil', which would If at all possible, I think I'd prefer to let the user interrupt with C-g rather than rely on some kind of timeout. Reading the original thread, you seem to say that this mostly affects "dired" operation, and that not only can it hang, but it can also be slow. So a few more questions: - In my experience dired-like operations over NFS servers should obey either normal speed or hang (if the server is unavailable) but "slow" is not something I'd expect. Do you know why it's sometimes slow? Is your NFS server itself "far/slow"? Is the slowness due to some automount (i.e. it's slow because of the time taken to perform the mount itself)? - Does this slow/hanging behavior appear only in dired? Does it only affect Emacs when using dired on a directory that's indeed on a NFS server, or does it affect accesses which don't obviously require NFS access? Stefan